Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Keep Endpoint policies up to date with license changes #83992

Merged
merged 9 commits into from
Dec 2, 2020

Conversation

pzl
Copy link
Member

@pzl pzl commented Nov 20, 2020

Summary

Upon license downgrade, any features enabled in endpoint policy that are no longer allowed by the change in license tier are reset and/or disabled

Depends on #83972 (merged)

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@pzl pzl added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Feature:Endpoint Elastic Endpoint feature v7.11.0 Team:Defend Workflows “EDR Workflows” sub-team of Security Solution labels Nov 20, 2020
@pzl pzl force-pushed the policy-change-check-ep-features branch 2 times, most recently from 4d4a524 to e56e436 Compare November 25, 2020 02:08
@pzl pzl marked this pull request as ready for review November 25, 2020 13:56
@pzl pzl requested review from a team as code owners November 25, 2020 13:56
@elasticmachine
Copy link
Contributor

Pinging @elastic/endpoint-app-team (Feature:Endpoint)

@pzl
Copy link
Member Author

pzl commented Nov 25, 2020

@elasticmachine merge upstream

@pzl pzl force-pushed the policy-change-check-ep-features branch 2 times, most recently from 5933185 to 178752e Compare November 25, 2020 15:52
x-pack/plugins/security_solution/common/license/license.ts Outdated Show resolved Hide resolved
} from '../../../../common/license/policy_config';
import { licenseService } from '../../../lib/license/license';

export class PolicyWatcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PolicyWatcher confused me a little because it did not mention that it's really a license watcher. Maybe LicenseWatcher would be more appropriate?

(ps. I can see this being used by other things in the future that might not be Policy related)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(two hard problems joke here)

This is in server/endpoint/lib/policy, and it's job is to watch license changes, in order to keep policy up to date. I'm open to better names. LicenseWatcher is a possibility, but note that our other class, LicenseService is copy-pasted from about 4 other plugins that have the same class. But some of them call that LicenseWatcher. Wanted to prevent that confusion.

Also I do agree we may want other side-effects to license changes that may not be policy related. But those should create their own subscription via LicenseService observable. Not use this. Observables are designed to have anyone subscribe who also needs it, not collectively boggarting off of one sub.

policyConfig,
licenseService
);
this.policyService.update(this.soClient, policy.id, policy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... Should you be throttling here? You could be generating several calls in parallel/rapid succession.

Also - you need to account for failures here for two reasons:

  1. to ensure that the remainder of the loop succeeds. You also don't want to cause the outer look (do{} while()) to throw, I don't think (not sure what that will do to the observable calling logic).
  2. to account for update conflicts, which may be the case if an update is done right after you retrieve the list of items or by other Kibana instance that may also be doing this same thing. I think we said maybe you do 1 attempt after a failure and then just log it if it still fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an accepted practice for how to throttle? This is certainly could be firing off a lot of update calls.

If I wrap update in a try/catch, do I now need to await update to make sure it's actually caught? If I'm now awaiting, do I still need to throttle since they are no longer in parallel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re:

Do we have an accepted practice for how to throttle?

I did a little of this in Fleet using Bluebird - see here

await bluebird.map(
items,
(agentPolicy: GetAgentPoliciesResponseItem) =>
listAgents(soClient, {
showInactive: false,
perPage: 0,
page: 1,
kuery: `${AGENT_SAVED_OBJECT_TYPE}.policy_id:${agentPolicy.id}`,
}).then(({ total: agentTotal }) => (agentPolicy.agents = agentTotal)),
{ concurrency: 10 }
);

Being that you are paging through, you might able to use a variation of that approach (chain each page to do concurrent updates, then await at the end).

}
}

public async watch(license: ILicense) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point we had talked about possibly doing the "migration" slightly delayed from when the observable triggers in order to try and account for edge case where requests might be in flight that would override our changes here. Delaying the migration would allow for existing one to complete ++ any future ones would be auto-rejected by the API handles, so migrating should be able to account for all records.

Also - do you need to protect against the license changing while a migration is in flight?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I was part of any of these discussions. We want to add a sleep here?

Do we expect other things changing at the same moment the license changes? What's this about migrations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re:

Do we expect other things changing at the same moment the license changes? What's this about migrations?

My understanding is that changing of license can happen while the system is running - there is downtime when it happens. It just happens

re:

What's this about migrations?

Sorry for the confusion. In my mind, what this service here is doing is a migration - thats what I mean.

@pzl pzl force-pushed the policy-change-check-ep-features branch from 178752e to b9ade22 Compare November 25, 2020 18:32
@pzl pzl force-pushed the policy-change-check-ep-features branch from b9ade22 to 6113014 Compare November 25, 2020 20:33
}
}

public async watch(license: ILicense) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re:

Do we expect other things changing at the same moment the license changes? What's this about migrations?

My understanding is that changing of license can happen while the system is running - there is downtime when it happens. It just happens

re:

What's this about migrations?

Sorry for the confusion. In my mind, what this service here is doing is a migration - thats what I mean.

} catch (e) {
// try again for transient issues
try {
await this.policyService.update(this.soClient, policy.id, policy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the problem encountered as a version conflict error, then you can't just "try" again with the same payload. You will need to retrieve the record again and redo updates, and then use that to update record for the update.

One way to test this with a running kibana:

above line 102, update policy.version to something like 123. That update should fail for the version missmatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given our most recent conversation, will you leave this "retry" here or remove it and address it in the subsequent issue?

I'm ok either way

@pzl
Copy link
Member Author

pzl commented Dec 1, 2020

@elasticmachine merge upstream

Copy link
Contributor

@paul-tavares paul-tavares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given our recent off-PR conversation, I'm 👍

Thanks @pzl

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Distributable file count

id before after diff
default 43205 43206 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pzl pzl merged commit e194434 into elastic:master Dec 2, 2020
pzl added a commit to pzl/kibana that referenced this pull request Dec 2, 2020
gmmorris added a commit to gmmorris/kibana that referenced this pull request Dec 3, 2020
* master: (40 commits)
  fix: 🐛 don't add separator befor group on no main items (elastic#83166)
  [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323)
  [APM] Add APM agent config options (elastic#84678)
  Fixed a11y issue on rollup jobs table selection (elastic#84567)
  [Discover] Refactor getContextUrl to separate file (elastic#84503)
  [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654)
  [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749)
  [Lens] Migrate legacy es client and remove total hits as int (elastic#84340)
  Improve logging pipeline in @kbn/legacy-logging (elastic#84629)
  Catch @hapi/podium errors (elastic#84575)
  [Discover] Unskip date histogram test (elastic#84727)
  Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791)
  [Enterprise Search] Fix schema errors button (elastic#84842)
  [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589)
  [Maps] Always initialize routes on server-startup (elastic#84806)
  [Fleet] EPM support to handle uploaded file paths (elastic#84708)
  [Snapshot Restore] Fix initial policy form state (elastic#83928)
  Upgrade Node.js to version 14 (elastic#83425)
  [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992)
  [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Dec 3, 2020
* master: (236 commits)
  fix: 🐛 don't add separator befor group on no main items (elastic#83166)
  [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323)
  [APM] Add APM agent config options (elastic#84678)
  Fixed a11y issue on rollup jobs table selection (elastic#84567)
  [Discover] Refactor getContextUrl to separate file (elastic#84503)
  [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654)
  [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749)
  [Lens] Migrate legacy es client and remove total hits as int (elastic#84340)
  Improve logging pipeline in @kbn/legacy-logging (elastic#84629)
  Catch @hapi/podium errors (elastic#84575)
  [Discover] Unskip date histogram test (elastic#84727)
  Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791)
  [Enterprise Search] Fix schema errors button (elastic#84842)
  [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589)
  [Maps] Always initialize routes on server-startup (elastic#84806)
  [Fleet] EPM support to handle uploaded file paths (elastic#84708)
  [Snapshot Restore] Fix initial policy form state (elastic#83928)
  Upgrade Node.js to version 14 (elastic#83425)
  [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992)
  [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006)
  ...
pzl added a commit that referenced this pull request Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Endpoint Elastic Endpoint feature release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants