-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Data Source Categorization Fields #958
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
<!-- Leave this ID at 0000. The ECS team will assign a unique, contiguous RFC number upon merging the initial stage of this RFC. --> | ||
|
||
- Stage: **0 (strawperson)** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html --> | ||
- Date: **August 26 2020** <!-- The ECS team sets this date at merge time. This is the date of the latest stage advancement. --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update right before we merge to reflect current date.
- Web server | ||
|
||
## Usage | ||
Categorization fields in ECS can govern how we categorize these data source, but only a limited set of event.category values are supported by the schema today. The event categorisation fields are catered to individual events, but don't categorise the data source. Expanding the values we support, allows us to align the user experience from ECS, Ingest Manager and the Elastic Website (elastic.co/integrations). Some additional context here: #845 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Categorization fields in ECS can govern how we categorize these data source, but only a limited set of event.category values are supported by the schema today. The event categorisation fields are catered to individual events, but don't categorise the data source. Expanding the values we support, allows us to align the user experience from ECS, Ingest Manager and the Elastic Website (elastic.co/integrations). Some additional context here: #845 (comment). | |
Categorization fields in ECS can govern how we categorize these data source, but only a limited set of event.category values are supported by the schema today. The event categorisation fields are catered to individual events, but don't categorise the data source. Expanding the values we support, allows us to align the user experience from ECS, Ingest Manager and the Elastic Website (elastic.co/integrations). Some additional context here: [#845 (comment)](https://github.com/elastic/ecs/pull/845#issuecomment-651414817). |
Looks like the Markdown link got lost in the copy/paste.
- productivity | ||
- proxy | ||
- queue/message queue | ||
- security |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How wide/all-encompassing are these feels intended to be? It looks like a mixture of pretty narrow as well as pretty wide categories. For example, would all firewall
, audit
, edr
, ids/ips
, threat intelligence
, and vulnerability scanner
categories also be marked security
?
Similar thoughts with things like proxy
, application
, and cloud
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We included some generic categories to allow for searching/correlation across these categories, e.g. show me events across all my security data sources, cloud sources, etc. It cloud also open up the possibility for subcategories e.g. AWS being cloud, but within AWS, CloudTrail could fall under security.
|
||
The fieldset we use to describe the data source is up for discussion, data_stream.category is a possibility. Here are proposed allowed values: | ||
|
||
- apm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing. I suggest we standardize on the capitalization and naming. For example we have an event.category of "iam" but a proposed data_stream.category of "Identity and access management". Also we have an example of "ids" for observer.type and a proposed data_stream.category of "IDS".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Stage 0: Provide a high level summary of the premise of these changes. Briefly describe the nature, purpose, and impact of the changes. ~2-5 sentences. | ||
--> | ||
|
||
Elastic currently supports ingestion of data from 180+ sources, and growing. However, we do not have a coherent way to categorise these sources. This has resulted in a disconnect in how we categorize these sources from the Elastic website, in-product experiences and ECS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the allowed values data_stream.category and observer.type should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to bring this up.
I'm not sure I would go this direction. I think we should establish a list of allowed values, and make sure sources and pipelines populate based on this predictable list. Otherwise we could get all sorts of arbitrary differences in capitalizations and ways of writing things.
@mostlyjason would you mind reviewing the list of proposed categories and suggest any additional categories on the o11y side, if any. Thanks! |
@cosiomoises @paulewing Would you mind taking a look at the proposed security data source categories? These categories may eventually be used to suggest relevant detection rules based on enabled integrations. Would be great to get your thoughts. |
@jamiehynds how will these categories be used? Also, how do they relate to our existing categories here https://www.elastic.co/integrations. I see several new ones added and several missing. Is this intended to be a replacement? |
@mostlyjason The intent is to provide alignment across the entire user experience from the Elastic web site (integration page), to Elastic in-product experiences (e.g. ingest manager), to index patterns, to ECS. ECS can govern that alignment via these proposed fields. It also opens up the possibility of aligning detection rules to enabled data sources, e.g. if a user has added a firewall data source, we can suggest appropriate detection rules that related to firewalls. Maybe there's a similar use case for alerts on the o11y side? These categories are intended to replace the existing integration categories. We haven't included existing categories such as AWS, Azure and Kubernetes as ECS doesn't use vendor names in the schema. |
@@ -0,0 +1,75 @@ | |||
# 0000: Data Source Categorization Fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Ingest Management we had many iterations on naming on data source has also some history in it. I'm wondering what exactly we categorise here. Is it the data itself which is in data_streams? Do we category the data_streams? Do we categorize the source from where the data is coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ruflin - the intent here is to categorize the source from where the data is coming from.
Thanks everyone for the great feedback and discussion. With this being a stage 0 candidate, the only criteria required for advancement is agreement that the premise has utility and could be an appropriate addition to ECS. Unless there are objections, I propose we capture the shared feedback and concerns in the proposal doc and begin refining and addressing concerns in the subsequent stages. I've captured this summary of feedback and concerns:
@jamiehynds - is there anyone else's feedback we may need to capture at this stage? |
Two question from my side:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks everyone for the feedback so far.
Thanks as well @ebeahan, I agree we should capture these concerns in the RFC document itself. As we reach a conclusion on some, we can document conclusions in the RFC. The RFC document should stand on its own -- including the concerns & resolutions -- without needing to refer to the PRs themselves too much.
I think the criteria for stage 0 has been met a long time ago (this is appropriate in ECS).
With all of the questions in the air at the moment, I suggest we retarget this PR to stage 1. This way we can get closure in this PR, rather than carrying over the discussion to the next PR.
# 0000: Data Source Categorization Fields | ||
<!-- Leave this ID at 0000. The ECS team will assign a unique, contiguous RFC number upon merging the initial stage of this RFC. --> | ||
|
||
- Stage: **0 (strawperson)** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we retarget to stage 1, since there's been so much discussion already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
|
||
Elastic currently supports ingestion of data from 180+ sources, and growing. However, we do not have a coherent way to categorise these sources. This has resulted in a disconnect in how we categorize these sources from the Elastic website, in-product experiences and ECS. | ||
|
||
The fieldset we use to describe the data source is up for discussion, data_stream.category is a possibility. Here are proposed allowed values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have been the one suggesting data_stream.category
as a possibility, a while ago.
But as the data_stream RFC is progressing, I no longer think this is the right approach.
I think the data_stream fields should be only dedicated to the indexing strategy itself, such as "how the index name is created".
I agree that a way of categorizing data sources is needed, but I think we should have this be another field, that would also makes sense in the 7.x monolithic indices. Having an out of place data_stream.category
field there would not be appropriate.
Stage 0: Provide a high level summary of the premise of these changes. Briefly describe the nature, purpose, and impact of the changes. ~2-5 sentences. | ||
--> | ||
|
||
Elastic currently supports ingestion of data from 180+ sources, and growing. However, we do not have a coherent way to categorise these sources. This has resulted in a disconnect in how we categorize these sources from the Elastic website, in-product experiences and ECS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to bring this up.
I'm not sure I would go this direction. I think we should establish a list of allowed values, and make sure sources and pipelines populate based on this predictable list. Otherwise we could get all sorts of arbitrary differences in capitalizations and ways of writing things.
## References | ||
|
||
* https://github.com/elastic/ecs/issues/901 | ||
* https://github.com/elastic/ecs/pull/845 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the link provided by @ruflin to the references, please?
Thanks for providing it, Nic 👍
However let's make sure the link stands the test of time, and link via the latest tag, rather than master:
* https://github.com/elastic/ecs/pull/845 | |
* https://github.com/elastic/ecs/pull/845 | |
* https://github.com/elastic/package-registry/blob/v0.12.1/util/package.go#L27 |
Discussed with @jamiehynds out-of-band, and we're not moving forward with this effort at this time. |
make test
? N/Amake
and committed those changes? N/APreview of the RFC