Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[alerts] add executionStatus to event log doc for action execute #82401

Merged
merged 2 commits into from
Nov 12, 2020

Conversation

pmuellr
Copy link
Member

@pmuellr pmuellr commented Nov 3, 2020

Summary

resolves #79785

Until now, the execution status was not available in the the event
log document for the execute action. In this PR we add it.

The event log is extended to add the following fields:

  • kibana.alerting.status - from executionStatus.status
  • event.reason - from executionStatus.error.reason

The date from the executionStatus and start date in the event
log will be set to the same value.

Previously, errors encountered before trying to execute an
alert executor, eg decrypting the alert, would not end up
with an event doc generated. Now they will.

Checklist

@pmuellr pmuellr force-pushed the alerts/log-execution-status branch 2 times, most recently from c356721 to 3ca7c23 Compare November 5, 2020 02:08
@pmuellr pmuellr force-pushed the alerts/log-execution-status branch 2 times, most recently from eb34d38 to 52bb9c0 Compare November 8, 2020 22:22
Copy link
Member Author

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted a couple of anomolies in the current tests

resolves elastic#79785

Until now, the execution status was available in the the event
log document for the execute action.  In this PR we add it.

The event log is extended to add the following fields:

- `kibana.alerting.status` - from executionStatus.status
- `event.reason`           - from executionStatus.error.reason

The date from the executionStatus and start date in the event
log will be set to the same value.

Previously, errors encountered while trying to execute an
alert executor, eg decrypting the alert, would not end up
with an event doc generated.  Now they will.

In addition, there were a few places where events that could
have had the action group in them did not, and one where the
instance id was undefined - those were fixed up.
@pmuellr pmuellr added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) release_note:skip Skip the PR/issue when compiling release notes v7.11 v8.0.0 labels Nov 10, 2020
@pmuellr pmuellr marked this pull request as ready for review November 10, 2020 17:27
@pmuellr pmuellr requested a review from a team as a code owner November 10, 2020 17:27
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member Author

pmuellr commented Nov 11, 2020

Wanted to point out some additional issues to open after working with this code; unrelated to this PR directly, but straight-forward and we should clean these up.

  • The generated config schema generated for events is here:

    export const EventSchema = schema.maybe(
    schema.object({
    '@timestamp': ecsDate(),
    It has an unfortunate schema.maybe() at the top which makes the entire type undefine-able, which can be painful to deal with. I looked at removing it - it's easy. But we should also go change all the places where we would then be excessively checking the value for undefined, and I think that's a fair number of places, and didn't want to clutter up this PR.

  • Poking around the action group usage, I realized it wasn't in the top-level variables, but should be:

    interface TransformActionParamsOptions {
    alertId: string;
    alertName: string;
    spaceId: string;
    tags?: string[];
    alertInstanceId: string;
    actionParams: AlertActionParams;
    alertParams: AlertTypeParams;
    state: AlertInstanceState;
    context: AlertInstanceContext;
    }
    I updated issue [Alerting Core] Add timestamp to provided template values #67389 with a comment about this field.

@@ -184,11 +184,15 @@ describe('Task Runner', () => {
expect(eventLogger.logEvent).toHaveBeenCalledTimes(1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes at the top of this file are a dog's breakfast, sorry. The order the events get logged has changed, where the execute event is written AFTER all the other events now, but post-dated to the start of the execution process (and will be <= the other events generated), since we now have to write it AFTER all the execution processing. So, the diff is not very pleasing ...

@@ -153,7 +155,8 @@ export class TaskRunner {
alert: SanitizedAlert,
params: AlertExecutorOptions['params'],
executionHandler: ReturnType<typeof createExecutionHandler>,
spaceId: string
spaceId: string,
event: Event
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now pass a partially populated event in to these "inner" functions of the execution logic, as we write the event at the "top level" of the execution logic, instead of in this "inner" one.

@@ -166,24 +169,10 @@ export class TaskRunner {
alertRawInstances,
(rawAlertInstance) => new AlertInstance(rawAlertInstance)
);
const originalAlertInstances = cloneDeep(alertInstances);
Copy link
Member Author

@pmuellr pmuellr Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the last set of alert instances (from the previous execution), so we can get the action group for "resolved" events, not just the original instance ids. Also, need to clone this, as alertInstances is mutated during the processing.

@@ -45,6 +45,10 @@
"outcome": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the two fields from the alert execution status, that weren't already populated somehow in the event, so added as new fields, in what I think are appropriate places - reason is a pre-existing. ECS field, but status is a new field in the custom ECS kibana.alertingfield

@@ -102,16 +102,16 @@ describe('EventLogger', () => {
event: { provider: 'test-provider', action: 'a' },
});

const ignoredTimestamp = '1999-01-01T00:00:00Z';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are now "post-dating" the execute event log docs, we need to allow the timestamp to be overridable when we merge fixed and initial event fields - previously you could NOT override it. That logic is in the file below, x-pack/plugins/event_log/server/event_logger.ts

@@ -0,0 +1,84 @@
/*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to test a decrypt error status in the event log, we need to run with security on - all the other eventLog-specific tests are in the spaces_only tests, so we reference some of the utilities used there via import.

}

function validateEvent(event: IValidatedEvent, params: ValidateEventLogParams): void {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this diff section, best viewed with ignore whitespace, as I lifted some functions out of a function scope, to make them top-level, so they could be exported, so they could be reused by the security_and_spaces tests

@pmuellr
Copy link
Member Author

pmuellr commented Nov 11, 2020

I've added a number of comments to the PR as guide posts - lots of stuff that seems a bit obscure in this PR :-)

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Verified that for event log where action = execute, there is now a status field with executionStatus and that if the status is error, a reason is populated.

Copy link
Contributor

@dhurley14 dhurley14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is awesome thanks very much! Tested it out locally I'm hoping to switch over from using our rule status saved object to just reading errors through the getAlertInstanceSummary on the alertsClient. Looks like that will work. This will really help detections clean things up in our own codebase.

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pmuellr
Copy link
Member Author

pmuellr commented Nov 12, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pmuellr pmuellr merged commit c3e5794 into elastic:master Nov 12, 2020
pmuellr added a commit to pmuellr/kibana that referenced this pull request Nov 12, 2020
…stic#82401)

resolves elastic#79785

Until now, the execution status was available in the the event
log document for the execute action.  In this PR we add it.

The event log is extended to add the following fields:

- `kibana.alerting.status` - from executionStatus.status
- `event.reason`           - from executionStatus.error.reason

The date from the executionStatus and start date in the event
log will be set to the same value.

Previously, errors encountered while trying to execute an
alert executor, eg decrypting the alert, would not end up
with an event doc generated.  Now they will.

In addition, there were a few places where events that could
have had the action group in them did not, and one where the
instance id was undefined - those were fixed up.
gmmorris added a commit to gmmorris/kibana that referenced this pull request Nov 12, 2020
* master:
  [Advanced Settings] Introducing telemetry (elastic#82860)
  [alerts] add executionStatus to event log doc for action execute (elastic#82401)
  Add additional sources routes (elastic#83227)
  [ML] Persisted URL state for the "Anomaly detection jobs" page (elastic#83149)
  [Logs UI] Add pagination to the log stream shared component (elastic#81193)
  [Index Management] Add an index template link to data stream details (elastic#82592)
  Add maps_oss folder to code_owners (elastic#83204)
  fix truncation issue (elastic#83000)
  [Ingest Manger] Move asset getters out of registry (elastic#83214)
pmuellr added a commit that referenced this pull request Nov 12, 2020
) (#83289)

resolves #79785

Until now, the execution status was available in the the event
log document for the execute action.  In this PR we add it.

The event log is extended to add the following fields:

- `kibana.alerting.status` - from executionStatus.status
- `event.reason`           - from executionStatus.error.reason

The date from the executionStatus and start date in the event
log will be set to the same value.

Previously, errors encountered while trying to execute an
alert executor, eg decrypting the alert, would not end up
with an event doc generated.  Now they will.

In addition, there were a few places where events that could
have had the action group in them did not, and one where the
instance id was undefined - those were fixed up.
gmmorris added a commit to gmmorris/kibana that referenced this pull request Nov 12, 2020
… alerts/action-groups-as-conditions

* origin/alerts/stack-alerts-public: (91 commits)
  removed import from plugin code as it causes FTR to fail
  [Advanced Settings] Introducing telemetry (elastic#82860)
  [alerts] add executionStatus to event log doc for action execute (elastic#82401)
  Add additional sources routes (elastic#83227)
  [ML] Persisted URL state for the "Anomaly detection jobs" page (elastic#83149)
  [Logs UI] Add pagination to the log stream shared component (elastic#81193)
  [Index Management] Add an index template link to data stream details (elastic#82592)
  Add maps_oss folder to code_owners (elastic#83204)
  fix truncation issue (elastic#83000)
  [Ingest Manger] Move asset getters out of registry (elastic#83214)
  make defaulted field non maybe
  Remove unused asciidoc file (elastic#83228)
  [Lens] Remove background from lens embeddable (elastic#83061)
  [Discover] Unskip flaky tests based on discover fixture index pattern (elastic#82991)
  Removing unnecessary trailing slash in CODEOWNERS
  Trying to fix CODEOWNERS again, where was a non-existent team prior (elastic#83236)
  Trying to fix CODEOWERS, missing a starting slash (elastic#83233)
  skip flaky suite (elastic#83231)
  Add enzyme rerender test helper (elastic#83208)
  Move Elasticsearch type definitions out of APM (elastic#83081)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Alerting] Write executionStatus property to kibana event log
6 participants