Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Event Log] Add support for aggregations #125645

Closed
banderror opened this issue Feb 15, 2022 · 11 comments
Closed

[Event Log] Add support for aggregations #125645

banderror opened this issue Feb 15, 2022 · 11 comments
Labels
8.2 candidate considered, but not committed, for 8.2 release Feature:EventLog Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@banderror
Copy link
Contributor

banderror commented Feb 15, 2022

Summary

In order to support various features in Security Solution (e.g. implementation of Rule Execution Log UI and Detection Engine health endpoint) we need to be able to execute aggregations on top of events in Event Log.

We have the following specific needs:
Edited to cross out the requirements addressed by #126948

  • We should be able to aggregate events across all Security rules. Examples:
    • aggregated execution metrics across all detection rules and per rule type (for each metric: median, p95, p99) over a specified time range
    • top 10 rules by each execution metric
  • We need to be able to query Event Log and aggregate over its events even if the rules that generated the events have been deleted and their saved objects don't exist anymore.
  • We shouldn't have restrictions on the fields we can sort by.
  • We should be able to aggregate events of a single given rule. Examples:
    • get buckets aggregated by kibana.alert.rule.execution.uuid, and all events in the bucket (including our Security events and Framework events)
    • for each event.action we are interested in, calculate aggregations specific to this event type (e.g. given event.action = "execution-metrics" and a specified time range, calculate percentiles for each metric we write in this event type)
  • We need to be able to combine aggregations with filters in a single request to Event Log.
  • We need support for sorting by multiple fields.
@banderror banderror added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:EventLog Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team labels Feb 15, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@pmuellr
Copy link
Member

pmuellr commented Feb 15, 2022

As an aside, apparently there are ways to have aggregations which can bypass filtering, and so are a security consideration. I don't know the details on this, just know that we are generally sensitive to exposing aggregation support at the HTTP API level because of it.

As such, it would be "simplest" for the ResponseOps team if the code did not have to worry about that, and we could do that by just exposing the aggregation API on the event log plugin APIs, and not through an HTTP interface. We could then make that API just take open-ended aggregations, and it would be the responsibility of the plugins calling that interface to make sure they are not using "security-challenged" aggs in their actual calls.

Long-term I would like to make an open-ended aggs HTTP endpoint for the event log, but not sure what all would be involved in securing it.

This implies that the eventual HTTP endpoints would end up being solution-specific, or perhaps rule-registry specific.

@gmmorris
Copy link
Contributor

gmmorris commented Feb 16, 2022

I think this is a very reasonable compromise @pmuellr.
That said, as the team that has designed the security model of Event Log (which is unique in our stack), I think it is up to us to review the usage made of this API to make sure consumers aren't circumventing this security model (especially if it is unintentional 😉 ).

Lets collaborate closely with the detection team to make sure the usage is secure.

@gmmorris
Copy link
Contributor

Regarding this requirement:

We need to be able to query Event Log and aggregate over its events even if the rules that generated the events have been deleted and their saved objects don't exist anymore.

We rely on the Alerting RBAC model to evaluate whether a user disallowed to read the rule in question.
Once the rule is deleted, this machanism can no longer be used.

This suggests we need the RBAC related fields on the Even Log entries themselves - looking at the mappings we don't seem to store the consumer or provider fields, both of which are needed.
We do store the namespace, but this means recreating some of the Spaces RBAC logic in Event Log, which is likely a large piece of work.

I'd recommend we split this issue into several deliverables and make incremental progress against it, as some of these requirements are bigger than others and it would be good to identify which ones are prerequisites to a first MVP.

@banderror
Copy link
Contributor Author

This suggests we need the RBAC related fields on the Even Log entries themselves - looking at the mappings we don't seem to store the consumer or provider fields, both of which are needed.

Yup, the good old Event Log RBAC topic strikes back! 🙂

I'd recommend we split this issue into several deliverables and make incremental progress against it, as some of these requirements are bigger than others and it would be good to identify which ones are prerequisites to a first MVP.

Sounds good to me 👍 I think at this point we're most interested in support for aggs for a single rule - to be able to properly finalize the Rule Execution Log UI. So probably these items are going to be required for that (@spong please correct me if I'm wrong):

  • For aggregated rule execution results view:
    • We should be able to aggregate events of a single given rule.
    • We need to be able to combine aggregations with filters in a single request to Event Log.
  • For plain logs view (e.g. scoped to a single kibana.alert.rule.execution.uuid):
    • We need support for sorting by multiple fields (@timestamp and event.sequence)
    • We shouldn't have restrictions on the fields we can sort by.

Maybe it would be helpful to write some examples of ES requests to the Event Log that would allow us to build these views in the app.

@pmuellr
Copy link
Member

pmuellr commented Feb 16, 2022

looking at the mappings we don't seem to store the consumer or provider fields, both of which are needed.

We don't need the producer, as we can construct large, unwieldy queries based on the rule type instead, which is mapped (this the way it's done for SO's). That said, perhaps it would be better to have the producer mapped, so we don't have such unwieldy filters :-).

@mikecote mikecote moved this from Awaiting Triage to Todo in AppEx: ResponseOps - Execution & Connectors Feb 17, 2022
@mikecote mikecote moved this from Todo to In Progress in AppEx: ResponseOps - Execution & Connectors Mar 7, 2022
@ymao1
Copy link
Contributor

ymao1 commented Mar 7, 2022

@banderror I have a draft PR to try to address some (not all!) of these requirements. Can you take a look? https://github.com/elastic/kibana/pull/126948/files

  • Exposes aggregations at the plugin API level, but no HTTP API
  • Support for filter & aggregations in same request
  • Updated the find function to take an array of sort options. This also changes the HTTP API options so I have to double check with the team but the event log API is not an official API so I think this breaking change might be ok.

This PR does not address the following:

  • No updates to support real RBAC, just checks user access to given rule IDs
  • Did not remove restrictions for fields to sort against. I think if you want to add a new field, you can just add it to the list of allowed fields?

Question:

  • I am defaulting the size to 0 in the aggregate API and just returning the aggregation part of the ES result. Is this sufficient?

@spong
Copy link
Member

spong commented Mar 9, 2022

Thanks @ymao1! I'll give this a test and make sure everything is there to port over the Rule Execution Log (#126215) and provide any additional feedback.

I am defaulting the size to 0 in the aggregate API and just returning the aggregation part of the ES result. Is this sufficient?

This should be fine -- I don't see an immediate need for returning individual docs, so we should be good here.

@ymao1 ymao1 moved this from In Progress to In Review in AppEx: ResponseOps - Execution & Connectors Mar 9, 2022
@ymao1 ymao1 moved this from In Review to Todo in AppEx: ResponseOps - Execution & Connectors Mar 10, 2022
@spong
Copy link
Member

spong commented Mar 21, 2022

Closing as addressed by #126948

@spong spong closed this as completed Mar 21, 2022
Repository owner moved this from Todo to Done in AppEx: ResponseOps - Execution & Connectors Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.2 candidate considered, but not committed, for 8.2 release Feature:EventLog Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
No open projects
Development

No branches or pull requests

6 participants