Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Asset Management] Osquery telemetry updates #100754

Merged
merged 11 commits into from
Jun 17, 2021

Conversation

lykkin
Copy link
Contributor

@lykkin lykkin commented May 27, 2021

Summary

Adds osquery app usage metrics to the kibana telemetry mapping
Adds support to collect osquery app usage metrics

image

TODO in a follow on PR:
Get beat payload size + query duration metrics.

@lykkin lykkin added release_note:enhancement v8.0.0 v7.14.0 Team:Asset Management Security Asset Management Team Feature:Osquery Security Solution Osquery feature auto-backport Deprecated - use backport:version if exact versions are needed labels May 27, 2021
@lykkin lykkin self-assigned this May 27, 2021
@lykkin lykkin marked this pull request as ready for review June 4, 2021 05:51
@lykkin lykkin requested review from a team as code owners June 4, 2021 05:51
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-asset-management (Team:Asset Management)

@lykkin
Copy link
Contributor Author

lykkin commented Jun 4, 2021

@elasticmachine merge upstream

@lykkin
Copy link
Contributor Author

lykkin commented Jun 4, 2021

@elasticmachine merge upstream

Copy link
Member

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema updates LGTM.

nit: Would recommend some unit tests for your usage collectors, as we've uncovered some legit bugs with them in the past (especially in cases where you are relying on stuff happening to determine isReady). Some examples in src/plugins/kibana_usage_collection/server/collectors

Copy link
Contributor

@pjhampton pjhampton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌔 🚀 ✨ LGTM ✨ 🚀 🌔

Just some questions but otherwise, looks good.

) {
const metric = await soClient.get<CounterValue>(usageMetricSavedObjectType, route);
metric.attributes[key] += increment;
await soClient.update(usageMetricSavedObjectType, route, metric.attributes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever flush the state, or does it just increment forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a cumulative count. i'm not entirely certain about the life times of saved objects, what events do they get flushed in outside of clearing them manually? i might be able to do something like clearing the count on fetch, but it felt like a wash to me.

are there any situations where delta metrics would be better for this kind of thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bearing with me. I have been OOO for a couple of days :-)

what events do they get flushed in outside of clearing them manually?

They don't - unless a space is deleted AFAIK.

are there any situations where delta metrics would be better for this kind of thing?

Delta is maybe not a good word to describe it, but from what I have seen the usage collector runs every 24 hr and developers typically try to get a snapshot of the last 24 hours. That makes me think that clearing the count on read is what is needed. But I guess it only matters to people / systems consuming the telemetry. It's your call :- )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the context! i feel better about changing it knowing there's prior art, and knowing about the collection interval.

x-pack/plugins/osquery/server/usage/collector.ts Outdated Show resolved Hide resolved
Copy link
Member

@Bamieh Bamieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema LGTM. I love the changes in this PR :elasticheart: other than the savedObjects client type casting LGTM.

x-pack/plugins/osquery/server/usage/collector.ts Outdated Show resolved Hide resolved
@lykkin
Copy link
Contributor Author

lykkin commented Jun 10, 2021

@elasticmachine merge upstream

@lykkin
Copy link
Contributor Author

lykkin commented Jun 15, 2021

@elastic/kibana-core - i'm curious if the usage of saved objects in this pr would cause potential double counting of the stats being tracked in them when multiple kibana instances are being used for the same ES instance. are my worries misplaced?

@pjhampton
Copy link
Contributor

@elastic/kibana-core - i'm curious if the usage of saved objects in this pr would cause potential double counting of the stats being tracked in them when multiple kibana instances are being used for the same ES instance. are my worries misplaced?

@Bamieh could give a more authoritative answer than me, but I don't believe the usage collector would run on each Kibana instance - I think it would be an available worker approach. This is because from my own experience I haven't seen duplicated telemetry records coming through the security pipelines.

@Bamieh
Copy link
Member

Bamieh commented Jun 17, 2021

@pjhampton @lykkin I'm gonna follow up sync about the PR just to double check I havent missed anything before giving a final OK.

I cant find areas where double counting is possible. We do have usage counters to replace this counting implementation and handle all these details instead of manually writing this logic here.

@lykkin
Copy link
Contributor Author

lykkin commented Jun 17, 2021

@elasticmachine merge upstream

@lykkin lykkin enabled auto-merge (squash) June 17, 2021 17:14
@Bamieh
Copy link
Member

Bamieh commented Jun 17, 2021

I've synced with Bryan and we're good to merge 👍

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/development-plugin-saved-objects.html#_mappings

id before after diff
osquery-usage-metric - 3 +3

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @lykkin

@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jun 17, 2021
* first pass of basic osquery usage stats collection

* updates, linting

* updated exported metrics

* clean up comments, add description fields to metric fields

* reworked types

* actually use the updated types

* added tests around the route usage recoder functions

* review comments

* update aggregate types

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine added a commit that referenced this pull request Jun 17, 2021
* first pass of basic osquery usage stats collection

* updates, linting

* updated exported metrics

* clean up comments, add description fields to metric fields

* reworked types

* actually use the updated types

* added tests around the route usage recoder functions

* review comments

* update aggregate types

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Bryan Clement <bclement01@gmail.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jun 18, 2021
…ets-tab

* 'master' of github.com:elastic/kibana: (93 commits)
  [ML] Remove blank job definition as it is unused and out-of-sync with Elasticsearch (elastic#102506)
  [Lens] Fix wrong error detection on transition to Top values operation (elastic#102384)
  [ML] Anomaly detection job custom_settings improvements (elastic#102099)
  [Cases] Route: Get all alerts attach to a case (elastic#101878)
  Fixes wrong list exception type when creating endpoint event filters list (elastic#102522)
  remove search bar that's not working yet (elastic#102550)
  Migrated Ingest Node Pipeline Functional Tests to use test_user (elastic#102409)
  [Maps] clean up feature editing name space to avoid conflicts with layer settings editing (elastic#102516)
  [canvas] Refactor Storybook from bespoke to standard configuration (elastic#101962)
  [Security Solution] adds wrapSequences method (RAC) (elastic#102106)
  [FTR] Stabilize SSLP functional tests (elastic#102553)
  [K8] Added `Inter` font files for new theme (elastic#102359)
  [Workplace Search] Convert Groups pages to new page template (elastic#102449)
  [DOC] Add experimental disclaimer to rollup jobs (elastic#95624)
  [Security Solution][Endpoint] Suppress some of the jest console.error noise created by endpoint list middelware (elastic#102535)
  [Fleet] Improve performance of Fleet setup (elastic#102219)
  [Alerting] Add event log entry when a rule starts executing (elastic#102001)
  [Fleet] Update docker image of registry used in integration tests (elastic#101911)
  [Asset Management] Osquery telemetry updates (elastic#100754)
  Converts saved object tagging to new management layout (elastic#102284)
  ...

# Conflicts:
#	x-pack/plugins/fleet/kibana.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Osquery Security Solution Osquery feature release_note:enhancement Team:Asset Management Security Asset Management Team v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants