-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Data Source Categorization Fields #958
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,73 @@ | ||||||||
# 0000: Data Source Categorization Fields | ||||||||
<!-- Leave this ID at 0000. The ECS team will assign a unique, contiguous RFC number upon merging the initial stage of this RFC. --> | ||||||||
|
||||||||
- Stage: **0 (strawperson)** <!-- Update to reflect target stage. See https://elastic.github.io/ecs/stages.html --> | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest we retarget to stage 1, since there's been so much discussion already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||||||||
- Date: **August 26 2020** <!-- The ECS team sets this date at merge time. This is the date of the latest stage advancement. --> | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will update right before we merge to reflect current date. |
||||||||
|
||||||||
<!-- | ||||||||
As you work on your RFC, use the "Stage N" comments to guide you in what you should focus on, for the stage you're targeting. | ||||||||
Feel free to remove these comments as you go along. | ||||||||
--> | ||||||||
|
||||||||
<!-- | ||||||||
Stage 0: Provide a high level summary of the premise of these changes. Briefly describe the nature, purpose, and impact of the changes. ~2-5 sentences. | ||||||||
--> | ||||||||
|
||||||||
Elastic currently supports ingestion of data from 180+ sources, and growing. However, we do not have a coherent way to categorise these sources. This has resulted in a disconnect in how we categorize these sources from the Elastic website, in-product experiences and ECS. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if the allowed values data_stream.category and observer.type should be the same? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea to bring this up. I'm not sure I would go this direction. I think we should establish a list of allowed values, and make sure sources and pipelines populate based on this predictable list. Otherwise we could get all sorts of arbitrary differences in capitalizations and ways of writing things. |
||||||||
|
||||||||
The fieldset we use to describe the data source is up for discussion, data_stream.category is a possibility. Here are proposed allowed values: | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might have been the one suggesting But as the data_stream RFC is progressing, I no longer think this is the right approach. I think the data_stream fields should be only dedicated to the indexing strategy itself, such as "how the index name is created". I agree that a way of categorizing data sources is needed, but I think we should have this be another field, that would also makes sense in the 7.x monolithic indices. Having an out of place |
||||||||
|
||||||||
- apm | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Small thing. I suggest we standardize on the capitalization and naming. For example we have an event.category of "iam" but a proposed data_stream.category of "Identity and access management". Also we have an example of "ids" for observer.type and a proposed data_stream.category of "IDS". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||||||||
- application | ||||||||
- audit | ||||||||
- CASB | ||||||||
- cloud | ||||||||
- collaboration | ||||||||
- Config Management | ||||||||
- containers | ||||||||
- CRM | ||||||||
- EDR | ||||||||
- firewall | ||||||||
- Identity and access management | ||||||||
- IDS/IPS | ||||||||
- Operating System | ||||||||
- productivity | ||||||||
- proxy | ||||||||
- queue/message queue | ||||||||
- security | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How wide/all-encompassing are these feels intended to be? It looks like a mixture of pretty narrow as well as pretty wide categories. For example, would all Similar thoughts with things like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. We included some generic categories to allow for searching/correlation across these categories, e.g. show me events across all my security data sources, cloud sources, etc. It cloud also open up the possibility for subcategories e.g. AWS being cloud, but within AWS, CloudTrail could fall under security. |
||||||||
- storage | ||||||||
- threat intelligence | ||||||||
- ticketing | ||||||||
- VPN | ||||||||
- vulnerability scanner | ||||||||
- Web server | ||||||||
|
||||||||
## Usage | ||||||||
Categorization fields in ECS can govern how we categorize these data source, but only a limited set of event.category values are supported by the schema today. The event categorisation fields are catered to individual events, but don't categorise the data source. Expanding the values we support, allows us to align the user experience from ECS, Ingest Manager and the Elastic Website (elastic.co/integrations). Some additional context here: #845 (comment). | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Looks like the Markdown link got lost in the copy/paste. |
||||||||
|
||||||||
These categories could also be used to categorise detection rules, to map data sources to corresponding rules. This would improve our onboarding experience by suggesting detection rules to users based on the sources they are ingesting data from. | ||||||||
|
||||||||
|
||||||||
## People | ||||||||
|
||||||||
The following are the people that consulted on the contents of this RFC. | ||||||||
|
||||||||
* @jamiehynds | author | ||||||||
* @exekias | sponsor | ||||||||
|
||||||||
## References | ||||||||
|
||||||||
* https://github.com/elastic/ecs/issues/901 | ||||||||
* https://github.com/elastic/ecs/pull/845 | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add the link provided by @ruflin to the references, please? Thanks for providing it, Nic 👍 However let's make sure the link stands the test of time, and link via the latest tag, rather than master:
Suggested change
|
||||||||
|
||||||||
### RFC Pull Requests | ||||||||
|
||||||||
<!-- An RFC should link to the PRs for each of it stage advancements. --> | ||||||||
|
||||||||
* Stage 0: https://github.com/elastic/ecs/pull/958 | ||||||||
|
||||||||
<!-- | ||||||||
* Stage 1: https://github.com/elastic/ecs/pull/NNN | ||||||||
... | ||||||||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Ingest Management we had many iterations on naming on data source has also some history in it. I'm wondering what exactly we categorise here. Is it the data itself which is in data_streams? Do we category the data_streams? Do we categorize the source from where the data is coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ruflin - the intent here is to categorize the source from where the data is coming from.