-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial telemetry setup #69330
initial telemetry setup #69330
Conversation
8c94f41
to
33dadc1
Compare
x-pack/plugins/security_solution/server/lib/telemetry/endpoint_metadata_telemetry.ts
Outdated
Show resolved
Hide resolved
8f7925f
to
790ff51
Compare
Pinging @elastic/endpoint-data-visibility-team (Team:Endpoint Data Visibility) |
f7ee278
to
ae07d5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the TODOs could you move those to github issues?
If we're not really getting endpoint data right now it might be worth switching the names to agent until we are actually getting endpoint specific data.
|
||
export const getDefaultEndpointMetadataTelemetry = (): EndpointMetadataTelemetry => ({ | ||
endpoints: { installed: 0, last_24_hours: 0 }, | ||
os: {} as Record<string, EndpointMetadataOSTelemtry>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does eslint complain if you don't do the as ...
part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did before when I initially set this up, but have since fixed. Will remove, thanks
}, | ||
}); | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A github issue might be a better spot to track the functionality implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I'm gonna pull this out, this was more for my own eyes as I finished up the initial implementation
}); | ||
const uniqueHostIds: Set<string> = new Set(); | ||
const endpointTelemetry = getDefaultEndpointMetadataTelemetry(); | ||
endpointTelemetry.endpoints.installed = endpointAgents.length; // All agents who have the endpoint registered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really tracking how many agents are installed? And we're assuming the endpoint has been installed if the agent has been installed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent references installed packages in the packages
field here. So when we filter for it, this should give us all of the agents that have installed the endpoint. Right now I'm using system
since I haven't been able to get the endpoint successfully installed via the agent 😞. @nchaulet , can you confirm my previous statement is accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually just saw a unique case where I enrolled a second agent on an endpoint and got 2 installed instead of one. Using the unique hosts id's now to track the count to avoid this issue
}; | ||
} | ||
|
||
export interface IngestManagerLocalMetadata extends AgentMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So fleet stores this information based on the endpoints that are connected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, successfully installed I believe
) { | ||
const collector = usageCollector.makeUsageCollector({ | ||
type: SIEM_USAGE_COLLECTOR_TYPE, | ||
isReady: () => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to wait on the ingest manager being initialized? I suppose if no saved objects existed then we jus wouldn't find them right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea and we wouldn't send empty telemetry until the next time this runs. @TinaHeiligers is there a better way to implement this?
export const getDefaultEndpointMetadataTelemetry = (): EndpointMetadataTelemetry => ({ | ||
endpoints: { installed: 0, last_24_hours: 0 }, | ||
os: {} as Record<string, EndpointMetadataOSTelemtry>, | ||
policies: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove these if we're not planning on implementing them for 7.9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I'll take them out unless something magical happens in the next day or two 😂
if (err.output?.statusCode === 404) { | ||
return {}; | ||
} | ||
return getDefaultEndpointMetadataTelemetry(); // TODO: no data returned if an error occured or should fail silently here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I wonder if we should return {}
if an error occurs because we don't know the results? It's probably not very helpful to send 0s right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, you're right. I was back and forth on this and saw different implementations, but I agree that an empty object is definitely more informative.
export const getEndpointMetadataTelemetryFromFleet = async ( | ||
savedObjectsClient: ISavedObjectsRepository | ||
) => { | ||
const { saved_objects: endpointAgents } = await savedObjectsClient.find<AgentSOAttributes>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelolo24 are these SOs global or space-aware? If you need to search across spaces you'll need to query the kibana index directly. You can see an example of that in rules/jobs telemetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are globals? I was told that I could access the SO directly if they were listed as namepaceType: 'agnostic'
. I'll take a look
ae07d5b
to
9247b6a
Compare
const { host, os, elastic } = localMetadata as AgentLocalMetadata; | ||
|
||
if (lastCheckin && new Date(lastCheckin) > aDayAgo) { | ||
// Get agents that have checked in within the last 24 hours to later see if their endpoints are running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we are able to get policy response details setup, we should be able to get this information in this metadata
|
||
endpointMetadataTelemetry.os = Object.values(osTracker); | ||
|
||
// Handle Last 24 Hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the best way to know for certain that an endpoint is actively running is to check the latest status the agent has for the endpoint from it's events list
savedObjectsClient.find<Agent>({ | ||
type: AGENT_SAVED_OBJECT_TYPE, | ||
fields: ['packages', 'last_checkin', 'local_metadata'], | ||
filter: `${AGENT_SAVED_OBJECT_TYPE}.attributes.packages: ${ENDPOINT_PACKAGE_CONSTANT}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of installed integrations are listed in packages
x-pack/plugins/security_solution/server/lib/telemetry/endpoint.mocks.ts
Outdated
Show resolved
Hide resolved
|
||
describe('test security solution endpoint telemetry', () => { | ||
let mockSavedObjectsRepository: jest.Mocked<ISavedObjectsRepository>; | ||
let getFleetSavedObjectsMetadataSpy: jest.SpyInstance<Promise<SavedObjectsFindResponse<Agent>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ These jest.Mocked and jest.SpyInstance deals look pretty cool. Good docs for them anywhere I can read?
x-pack/plugins/security_solution/server/lib/telemetry/endpoint.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/security_solution/server/lib/telemetry/endpoint.ts
Outdated
Show resolved
Hide resolved
const { last_checkin: lastCheckin, local_metadata: localMetadata } = metadataAttributes; | ||
const { host, os, elastic } = localMetadata as AgentLocalMetadata; | ||
|
||
if (lastCheckin && new Date(lastCheckin) > aDayAgo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ This is a big reduce
and it seems like maybe it could be easier to read if we did some of this stuff in front of it as a filter
. This is a ❔ more on the "ignore for now" side, just a comment for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will break apart in next commit
f5449f4
to
f78511e
Compare
cc @nchaulet if you can take a look through when you get a chance. Just want to make sure I didn't miss anything, including potential optimizations, regarding the |
f78511e
to
6aff112
Compare
) => | ||
savedObjectsClient.find<AgentEventSOAttributes>({ | ||
type: AGENT_EVENT_SAVED_OBJECT_TYPE, | ||
filter: `${AGENT_EVENT_SAVED_OBJECT_TYPE}.attributes.agent_id: ${agentId}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner , is there a way for me to also filter for values that have the value endpoint
somewhere in the attributes.message
field? Just trying to think of ways to optimize this here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on filtering this, also maybe fetch only data for the last n time, no? this list could be really long at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not sure about that. I'll have to take a look at how the filtering works with saved objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nchaulet I would do that, but only issue is, if a policy was enabled a week ago, and nothing has changed, then there won't be any endpoint
related updates in the events lists in the past lets say 24 hours. i.e. there isn't a STILL_RUNNING
subtype 😂. Hence why it would be great to filter by all references to endpoint, but don't think that's possible via the filter?
If I were to just check if anything has changed regarding the endpoint in the last 24 hours, then I would have to keep some knowledge from day to day of the previous state, and that feels like a recipe for bugs if something gets missed in the cracks somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's pure EQL and this should work to get all endpoint references in the message! Hoot Hoot 😄
`${AGENT_EVENT_SAVED_OBJECT_TYPE}.attributes.agent_id: ${agentId} and ${AGENT_EVENT_SAVED_OBJECT_TYPE}.attributes.message: "${FLEET_ENDPOINT_PACKAGE_CONSTANT}"`
re here: https://www.elastic.co/guide/en/kibana/master/kuery-query.html#_language_syntax
Quotes around a search term will initiate a phrase search. For example, message:"Quick brown fox" will search for the phrase "quick brown fox" in the message field. Without the quotes, your query will get broken down into tokens via the message field’s configured analyzer and will match documents that contain those tokens, regardless of the order in which they appear. This means documents with "quick brown fox" will match, but so will "quick fox brown". Remember to use quotes if you want to search for a phrase.
@michaelolo24 can you run |
@afharo, done. Sorry about that! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Telemetry changes (schema), LGTM!
Bear in mind, though we are using the same type
as in #71102
You'll get some conflicts if the other PR is merged first (and vice-versa)
}, | ||
schema: { | ||
endpoint: { | ||
total_installed: { type: 'number' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: To help us out, it might be beneficial if devs specify 'long'
instead.
Looks good as-is anyway. No need to change it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, I can change it though. No biggie. Once @rylnd has his PR in I'll rebase on his and update these to long
😄
savedObjectsClient: ISavedObjectsRepository | ||
) { | ||
const collector = usageCollector.makeUsageCollector<UsageData>({ | ||
type: 'security_solution', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Plan is for him to merge his first and then I'll make the updates for mine on top of his changes once they're in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 the query to ingest manager saved objects looks good to me
706f5db
to
384733f
Compare
384733f
to
f74433f
Compare
f74433f
to
65ae3f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked it out locally, and I saw endpoints telemetry! I didn't look to closely at implementation details here, but tests look good and data looks 💯
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
* master: (314 commits) [APM] Use status_code field to calculate error rate (elastic#71109) [Observability] Change appLink passing the date range (elastic#71259) [Security] Add Timeline improvements (elastic#71506) adjust vislib bar opacity (elastic#71421) Fix ScopedHistory mock and adapt usages (elastic#71404) [Security Solution] Add hook for reading/writing resolver query params (elastic#70809) [APM] Bug fixes from ML integration testing (elastic#71564) [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404) [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275) [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321) Change signal.rule.risk score mapping from keyword to float (elastic#71126) Added help text where needed on connectors and alert actions UI (elastic#69601) [SIEM][Detections] Value Lists Management Modal (elastic#67068) [test] Skips test preventing promotion of ES snapshot elastic#71582 [test] Skips test preventing promotion of ES snapshot elastic#71555 [ILM] Fix alignment of the timing field (elastic#71273) [SIEM][Detection Engine][Lists] Adds the ability for exception lists to be multi-list queried. (elastic#71540) initial telemetry setup (elastic#69330) [Reporting] Formatting fixes for CSV export in Discover, CSV download from Dashboard panel (elastic#67027) Search across spaces (elastic#67644) ...
…t-apps-page-titles * 'master' of github.com:elastic/kibana: (88 commits) [ML] Functional tests - disable DFA creation and cloning tests [APM] Use status_code field to calculate error rate (elastic#71109) [Observability] Change appLink passing the date range (elastic#71259) [Security] Add Timeline improvements (elastic#71506) adjust vislib bar opacity (elastic#71421) Fix ScopedHistory mock and adapt usages (elastic#71404) [Security Solution] Add hook for reading/writing resolver query params (elastic#70809) [APM] Bug fixes from ML integration testing (elastic#71564) [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404) [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275) [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321) Change signal.rule.risk score mapping from keyword to float (elastic#71126) Added help text where needed on connectors and alert actions UI (elastic#69601) [SIEM][Detections] Value Lists Management Modal (elastic#67068) [test] Skips test preventing promotion of ES snapshot elastic#71582 [test] Skips test preventing promotion of ES snapshot elastic#71555 [ILM] Fix alignment of the timing field (elastic#71273) [SIEM][Detection Engine][Lists] Adds the ability for exception lists to be multi-list queried. (elastic#71540) initial telemetry setup (elastic#69330) [Reporting] Formatting fixes for CSV export in Discover, CSV download from Dashboard panel (elastic#67027) ... # Conflicts: # x-pack/plugins/index_management/public/application/index.tsx
Summary
For reference, the original issue: https://github.com/elastic/protections-team/issues/161
This PR looks to retain the following:
Number of active endpoints
Endpoints have protections enabled or not (malware for now) (TBD).
List by customer (provided by the telemetry service, license.issued_to)
OS type
OS version
With the exception of policy details (for now), the above information is tracked in
savedObjects
owned by the ingest_manager team. Given our inability to access our indices directly due to security concerns, this provides the best current alternative.Proposed telemetry schema, but a work in progress depending on discussion here as well as ability to obtain policy details. This description and PR will be updated accordingly as those details are finalized.
Schema conversation (including detections work): https://github.com/elastic/telemetry/issues/392
Checklist
Delete any items that are not applicable to this PR.
For maintainers