-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Detections] Adoption telemetry #71102
Conversation
This uses ML and raw ES calls to query our ML Jobs and Rules, and parse them into a format to be consumed by telemetry. Still to come: * initialization * tests
The service seems to convert colons to underscores, so let's just use an underscure.
This allows us to test our adherence to the collector API, focusing particularly on the fetch function.
We're going to have our usage data under one key corresponding to the app, so this nests the existing data under a 'detections' key while allowing another fetching function to be plugged into the main collector under a separate key.
e73b13c
to
bbae701
Compare
@elasticmachine merge upstream |
filterPath: ['hits.hits._source.alert.enabled', 'hits.hits._source.alert.tags'], | ||
ignoreUnavailable: true, | ||
index, | ||
size: 10000, // elasticsearch index.max_result_window default value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ever likely to exceed this limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't seem likely that a user'd have 10k rules, but this ensures that if they did, we'd tally the first 10k instead of receiving an error and returning 0s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was used in another collector, but if you think there's risk here I'm happy to remove it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. And I hope they don't have more than 10k rules 😅. I was really just curious if there was a way beyond it beyond scroll
. I ran into an issue previously where I needed to set this max just to get all of the data I wanted (I think ES has a default max it returns?), but it looks fine.
* inlines collector options * inlines schema object * makes DetectionsUsage an interface instead of a type alias
Conflicts: x-pack/plugins/security_solution/server/plugin.ts
Pinging @elastic/siem (Team:SIEM) |
config: { | ||
job_type: 'anomaly_detector', | ||
description: | ||
'SIEM Auditbeat: Looks for unusual destination port activity that could indicate command-and-control, persistence mechanism, or data exfiltration activity (beta)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking we haven't migrated these detections to the SecuritySolution
naming pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took these from live data and the rename hasn't happened yet. I'll be sure to update these appropriately once that happens 👍
); | ||
|
||
if (ruleResults.hits?.hits?.length > 0) { | ||
ruleMetrics = ruleResults.hits.hits.map((hit) => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really an issue, but why not do the tally here rather than iterating twice? Or alternatively, just return ruleResults
and then just iterate once in the usageBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I'll change these to do a single loop 👍
const moduleJobs = modules.flatMap((module) => module.jobs); | ||
const jobs = await jobServiceProvider(mlCaller).jobsSummary(['siem']); | ||
|
||
jobMetrics = jobs.map((job) => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, but not sure how much of an issue it'll be to loop over all this data twice. I know we have about 60s till the usageCollector times out, but I don't think we'll run into that
We were previously performing two loops over each set of data: one to format it down to just the data we need, and another to convert that into usage data. We now perform both steps within a single loop.
const jobs = await jobServiceProvider(mlCaller).jobsSummary(['siem']); | ||
|
||
jobsUsage = jobs.reduce((usage, job) => { | ||
const isElastic = moduleJobs.some((moduleJob) => moduleJob.id === job.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does SIEM apply a prefix to the module when initially setting it up?
If so, this won't match and you'll have to do a partial match on the last part of job id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that this is not currently an issue, but good to know moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kibana Telemetry changes LGTM!
detections: { | ||
detection_rules_custom_enabled: { type: 'number' }, | ||
detection_rules_custom_disabled: { type: 'number' }, | ||
detection_rules_elastic_enabled: { type: 'number' }, | ||
detection_rules_elastic_disabled: { type: 'number' }, | ||
ml_jobs_custom_enabled: { type: 'number' }, | ||
ml_jobs_custom_disabled: { type: 'number' }, | ||
ml_jobs_elastic_enabled: { type: 'number' }, | ||
ml_jobs_elastic_disabled: { type: 'number' }, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Can we use type: 'long'
instead (or 'float'
if that's the intended format)?
This will help us understand the type of number to be indexed 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT 2: Have you considered using a tree structure instead?
{
"properties": {
"detections": {
"properties": {
"detection_rules": {
"custom": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
},
"elastic": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
}
},
"ml_jobs": {
"custom": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
},
"elastic": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
}
}
}
}
}
}
@elasticmachine merge upstream |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
* style: sort plugin interface * WIP: UsageCollector for Security Adoption This uses ML and raw ES calls to query our ML Jobs and Rules, and parse them into a format to be consumed by telemetry. Still to come: * initialization * tests * Initialize usage collectors during plugin setup * Rename usage key The service seems to convert colons to underscores, so let's just use an underscure. * Collector is ready if we have a kibana index * Refactor collector to generate options in a function This allows us to test our adherence to the collector API, focusing particularly on the fetch function. * Refactor usage collector in anticipation of endpoint data We're going to have our usage data under one key corresponding to the app, so this nests the existing data under a 'detections' key while allowing another fetching function to be plugged into the main collector under a separate key. * Update our collector to satisfy telemetry tooling * inlines collector options * inlines schema object * makes DetectionsUsage an interface instead of a type alias * Extracts telemetry mappings via scripts/telemetry_extract * Refactor detections usage logic to perform one loop instead of two We were previously performing two loops over each set of data: one to format it down to just the data we need, and another to convert that into usage data. We now perform both steps within a single loop. * Refactor detections telemetry to be nested * Extract new nested detections telemetry mappings Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Pinging @elastic/security-solution (Team: SecuritySolution) |
Summary
This adds a new UsageCollector that collects adoption data for both Detection Rules and ML Jobs within our solution's group.
The metrics being collected are:
TODO:
Checklist
Delete any items that are not applicable to this PR.
For maintainers