Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic][Security Solution][Detections] Rule Execution Log - UI on the Rule Details page #101014

Closed
2 tasks done
banderror opened this issue May 31, 2021 · 5 comments
Closed
2 tasks done
Assignees
Labels
8.2 candidate considered, but not committed, for 8.2 release epic Feature:Rule Monitoring Security Solution Detection Rule Monitoring area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Theme: simp_prot_mgmt Security Solution Simplified Protection Management Theme v8.2.0

Comments

@banderror
Copy link
Contributor

banderror commented May 31, 2021

Summary

Replace the existing "Failure History" tab with an enhanced Rule Execution Log UI on the Rule Details page.

New rule execution log documents are going have some standard ECS fields + some custom fields, which will make them technically similar to detection alerts and source events in terms of flexibility of analysis, showing in tables in the UI etc. This will allow us to implement a more advanced Rule Execution Log UI - not only with 5 last failures, but with all rule execution status updates (current statuses are going to run, succeeded, warning, failed) without limitation in their number, current execution metrics (querying time, indexing time, gaps, etc), any new execution metrics (if needed), any additional events with arbitrary data, and just generic log messages for observability purposes.

Ideas popped up during chatting with @yiyangliu9286 and @xcrzx:

  • We likely need a normal log table (like we have now) to allow the user to troubleshoot issues with rule execution and find precise details about what happened and when.
  • Additional view with charts and statistics might help the user to get a bigger picture. E.g. how many failures happened within a time frame etc.
  • Probably it could be possible to use row renderers in the log table.

Resources

To do

First iteration (simple log UI with basic filtering and pagination, no visualizations):

@banderror banderror added Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. labels May 31, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@peluja1012 peluja1012 added Team:Detection Rule Management Security Detection Rule Management Team Feature:Rule Management Security Solution Detection Rule Management area labels Sep 15, 2021
@peluja1012 peluja1012 added the Theme: simp_prot_mgmt Security Solution Simplified Protection Management Theme label Oct 26, 2021
@banderror banderror changed the title [Security Solution][Detections] Rule Execution Log - UI on the Rule Details page (Draft) [Security Solution][Detections] Rule Execution Log - UI on the Rule Details page Nov 24, 2021
@banderror banderror changed the title [Security Solution][Detections] Rule Execution Log - UI on the Rule Details page [Meta][Security Solution][Detections] Rule Execution Log - UI on the Rule Details page Nov 24, 2021
@banderror banderror changed the title [Meta][Security Solution][Detections] Rule Execution Log - UI on the Rule Details page [Epic][Security Solution][Detections] Rule Execution Log - UI on the Rule Details page Nov 24, 2021
@banderror banderror added Feature:Rule Monitoring Security Solution Detection Rule Monitoring area and removed 8.1 candidate Feature:Rule Management Security Solution Detection Rule Management area needs design labels Nov 24, 2021
@spong
Copy link
Member

spong commented Feb 1, 2022

Discussed #124198 with team, holding for 8.2 to further iterate on implementation and UX.

@yiyangliu9286
Copy link

yiyangliu9286 commented Feb 24, 2022

@spong @banderror updated the mvp design based on our discussion (link to Figma):

  • Rule execution log view - default state UI:

Screen Shot 2022-02-24 at 1 03 45 PM

There will be a tooltip on "Show metrics columns" toggle:

Turn on to show more metrics data in this table. Metrics columns are:

  • Total alerts created
  • Total alerts detected
  • Gap duration
  • Index duration
  • Search duration
  • Schedule delay
  • When users want to filter by “rule execution id”, adding an action to each rule execution row:

Screen Shot 2022-02-24 at 1 03 53 PM

  • After filtering by rule execution id, will direct users to alerts tab w/ count (edge case: if nothing is found, will show an empty state like what we have in product currently):

Screen Shot 2022-02-24 at 1 03 59 PM

@spong
Copy link
Member

spong commented Feb 24, 2022

Awesome -- thank you @yiyangliu9286! 🙂 Will implement these in #126215 and ping you when ready for review!

spong added a commit that referenced this issue Mar 28, 2022
## Summary

Resolves #119598, #119599, #101014

Test plan ([internal doc](https://docs.google.com/document/d/1-prIUGYaPHiwGA79CgSdw1926lxIPKGWWkYOUD2BM1U/edit#heading=h.womzsfdt6zt8))

Adds `Rule Execution Log` table to Rule Details page:

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/158540840-e9cddb9b-f33d-4b95-86ad-cb3e0a00cf39.gif" />
</p>


### Implementation notes

The useful metrics within `event-log` for a given rule execution are spread between a few different platform (`execute-start`, `execute`) and security (`execution-metrics`, `status-change`) events. In effort to provide consolidated metrics per rule execution (and avoiding a lot of empty cells and mis-matched statuses like in the image below)

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/151933881-2e58f4d7-4cda-4528-9d44-37cb7bd5de9c.png" />
</p>



these rule execution events are aggregated by their `executionId`, and then fields are merged from each different event. This PR was re-worked to take advantage of the new event-log aggregation support added in #126948, and is no longer implemented as an in-memory aggregation server side.

* Due to restrictions around supplying search filters that may match multiple sub-agg buckets and missing data ([see discussion here](https://github.com/elastic/kibana/pull/127339/files#r825240516)), it was decided that we'd disable the search bar for the time being. We have both a near-term (writing single rollup event) and long-term (ES|QL) solution that will allow us to re-enable this functionality.

* Note, since a `terms` agg is used to fetch all execution events, an upper bound must be set. See [this discussion](https://github.com/elastic/kibana/pull/127339/files#r823035420) for more details, but setting this max to `1000` events for the time being, and returning total cardinality of execution events back within `total` to allow the UI to inform the user that they should narrow their search further to better isolate and find possible issues. This should be a be a reasonable constraint for most all rules as a rule executing every 5 minutes, 1000 executions would cover over 3 days of execution time.

<p align="center">
  <img width="700" src="https://user-images.githubusercontent.com/2946766/159045563-966896b4-3cd1-475d-9f0e-c2d300683546.png" />
</p>


The `Filter for alerts` action will be available on all `Succeeded`/`Partial Failure` executions even if there weren't alerts generated until #126210 is merged and we can start returning the alert count, at which point we can programmatically enabled/disable this action based on alert count.



<p align="center">
  <img width="300" src="https://user-images.githubusercontent.com/2946766/159051762-e2f97ba4-4ce1-4f67-8ae1-395e4b191cab.png" />
</p>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.2 candidate considered, but not committed, for 8.2 release epic Feature:Rule Monitoring Security Solution Detection Rule Monitoring area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Theme: simp_prot_mgmt Security Solution Simplified Protection Management Theme v8.2.0
Projects
None yet
Development

No branches or pull requests

6 participants