Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Shared Fleet agent policy action #76013

Merged
merged 16 commits into from
Sep 14, 2020

Conversation

nchaulet
Copy link
Member

@nchaulet nchaulet commented Aug 26, 2020

Summary

Resolve #68956

Currently during a config change (or during the first checkin) we create one saved object for each agent, this PR introduce a new concept of PolicyAction that can be shared between agents.

Done in this PR

  • introduce AgentPolicyAction an action that is shared between all agent assigned to a policy
  • add some cache to fetch action in the acknowledge handler
  • introduce ack_data to avoid fetching the whole config in the acknowledge handler.

Details

We now have two kind of action:
AgentAction that target individual agent
AgentPolicyAction that target all the agent assigned to a policy (support only the CONFIG_CHANGE action for now)

Also I introduce ack_data that are data useful to acknowledge an action, this allow to avoid decrypting the whole config every-time an agent acknowledge an action.

This PR introduce a lot of new types, let me know if you have better naming suggestions.
Also I did some cleaning by removing the action type we do not support.

Performance impact

While enrolling 200 agents, than performing a checkin and a ack you can notice the first checkin checkin_first_time_duration is a lot faster

Before

ack_duration..................: avg=2.11s    min=2.05s med=2.09s max=2.28s  p(90)=2.23s  p(95)=2.28s 
checkin_first_time_duration...: avg=2.24s    min=2.06s med=2.09s max=3.09s  p(90)=3.06s  p(95)=3.09s 

After

ack_duration..................: avg=2.02s   min=2.01s med=2.02s max=2.04s  p(90)=1.04s p(95)=2.04s
checkin_first_time_duration...: avg=1.14s    min=1.01s    med=1.04s max=2.14s  p(90)=1.17s p(95)=2.12s   

@nchaulet nchaulet self-assigned this Aug 31, 2020
@nchaulet nchaulet added Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0 v8.0.0 release_note:skip Skip the PR/issue when compiling release notes labels Aug 31, 2020
@@ -21,28 +21,53 @@ export type AgentStatus =
| 'unenrolling'
| 'degraded';

export type AgentActionType = 'CONFIG_CHANGE' | 'DATA_DUMP' | 'RESUME' | 'PAUSE' | 'UNENROLL';
export type AgentActionType = 'CONFIG_CHANGE' | 'UNENROLL';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some cleanup as currently we only support this two actions

@ph ph self-assigned this Aug 31, 2020
@nchaulet nchaulet changed the title [Ingest Manager] Shared policy action [Ingest Manager] Shared Fleet agebt policy action Aug 31, 2020
@nchaulet nchaulet force-pushed the feature-share-policy-action branch 2 times, most recently from b8a3bed to 82009b0 Compare August 31, 2020 19:41
@nchaulet nchaulet changed the title [Ingest Manager] Shared Fleet agebt policy action [Ingest Manager] Shared Fleet agent policy action Aug 31, 2020
@nchaulet nchaulet marked this pull request as ready for review August 31, 2020 20:26
@nchaulet nchaulet requested a review from a team August 31, 2020 20:26
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@nchaulet
Copy link
Member Author

nchaulet commented Sep 1, 2020

@elasticmachine merge upstream

@ph ph removed their assignment Sep 1, 2020
@ph
Copy link
Contributor

ph commented Sep 1, 2020

Not sure why I was assigned to this. I blame zube.

@nchaulet
Copy link
Member Author

nchaulet commented Sep 9, 2020

@EricDavisX This one introduce a lot of change and it's probably good candidate for some manual QA :)

@EricDavisX
Copy link
Contributor

@rahulgupta-qasource - can you pull Nicolas' fork/branch above and run Kibana in source with latest 8.0 / master ES and Agent SNAPSHOT builds and run some tests please?

Nicolas and I discussed changes and want to target the tests as such:

Set up 2 persistent Windows, and 1 persistent Linux Agent & Endpoint and see them call in to the default config.

  • Then create a new config and change them all to that, without Endpoint.
  • Then create 3 new configs and change them each to a new one, with Endpoint as fast as you can in the UI (use 2 browsers or enlist a friend to login to Kibana to help!)
  • Then re-start all the hosts at the same time and see them all come back on line.
  • Then unenroll all 3 and see them go off-line
  • then re-enroll them all again to different Agent policies.

Note that viewing the Agent in Fleet and seeing it on-line and config changes is enough validation, it doesn't require a deep validation for now.

Then we can do some interesting tests, to try harder to break it:

  • config-changes when the agent is unreachable / off-line (do you have a set way to easily simulate that?). It should pick up the change when its comes on line

  • test when multiple config changes come in near the same time (maybe 2 different Admins make changes to the same policy, which is a bad idea, but it can mimic real world potential.

  • any other creative nuance you can think of, anything based on our test suites and how config changes are validated.

Let us know how it goes, thank you in advance.

@ghost
Copy link

ghost commented Sep 10, 2020

Hi @EricDavisX

Thank you for the update.

Could you please share the environment w.r.t above Nicolas fork/branch to validate this ticket.

Moreover, Error 'Unable to initialize Ingest Manager: Bad Request' is displayed on clicking 'Ingest Manager' link on 8.0.0-SNAPSHOT Kibana. We have reported bug #77133 for the same.

Please let us know if anything is missing from our end.

@EricDavisX
Copy link
Contributor

I'll discuss testing with Rahul offline and we'll confirm back.

@@ -15,12 +22,37 @@ export async function createAgentAction(
soClient: SavedObjectsClientContract,
newAgentAction: Omit<AgentAction, 'id'>
): Promise<AgentAction> {
const so = await soClient.create<AgentActionSOAttributes>(AGENT_ACTION_SAVED_OBJECT_TYPE, {
return createAction(soClient, newAgentAction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to overload createAgentAction or have it accept different types instead of having to create the new functions createAgentPolicyAction and createAction?

Copy link
Contributor

@jfsiii jfsiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice overview in the description and the code is clear. I left some comments and questions but nothing to prevent a 🚀

@EricDavisX
Copy link
Contributor

I'll discuss testing with Rahul offline and we'll confirm back.
Rahul is not able to easily (quickly) stand up Kibana from source to test this, and I don't have bandwidth to easily help cover it. So we can do some testing prior and some testing after merge. Please do cite what you can cover and we'll manage the rest after check-in I guess?

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

Saved Objects .kibana field count

id value diff baseline
fleet-agent-actions 9 +3 6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@nchaulet nchaulet merged commit 61951a5 into elastic:master Sep 14, 2020
@nchaulet nchaulet deleted the feature-share-policy-action branch September 14, 2020 01:09
nchaulet added a commit to nchaulet/kibana that referenced this pull request Sep 14, 2020
jloleysens added a commit to jloleysens/kibana that referenced this pull request Sep 14, 2020
…s-for-710

* 'master' of github.com:elastic/kibana: (65 commits)
  Separate url forwarding logic and legacy services (elastic#76892)
  Bump yargs-parser to v13.1.2+ (elastic#77009)
  [Ingest Manager] Shared Fleet agent policy action (elastic#76013)
  [Search] Re-add support for aborting when a connection is closed (elastic#76470)
  [Search] Remove long-running query pop-up (elastic#75385)
  [Monitoring] Fix UI error when alerting is not available (elastic#77179)
  do not log plugin id format warning in dist mode (elastic#77134)
  [ML] Improving client side error handling (elastic#76743)
  [Alerting][Connectors] Refactor IBM Resilient: Generic Implementation (phase one) (elastic#74357)
  [Docs] some basic searchsource api docs (elastic#77038)
  add  cGroupOverrides to the legacy config (elastic#77180)
  Change saved object bulkUpdate to work across multiple namespaces (elastic#75478)
  [Security Solution][Resolver] Replace Selectable popover with badges (elastic#76997)
  Removing ml-state index from archive (elastic#77143)
  [Security Solution] Add unit tests for histograms (elastic#77081)
  [Lens] Filters aggregation  (elastic#75635)
  [Enterprise Search] Update WS Overview logic to use new config data (elastic#77122)
  Cleanup type output before building new types (elastic#77211)
  [Security Solution] Use safe type in resolver backend (elastic#76969)
  Use proper lodash syntax (elastic#77105)
  ...

# Conflicts:
#	x-pack/plugins/index_lifecycle_management/public/application/sections/edit_policy/components/node_allocation.tsx
@EricDavisX
Copy link
Contributor

we're running the testing now in a separate test ticket, just fyi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ingest Manager] Share CONFIG_CHANGE actions between agents
7 participants