-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
event.original optionality across all packages #777
Comments
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
This should also go for original event fields (i.e. CEF module). We should not have duplicate fields by default. |
We currently use a tag of "forwarded" to control if host info is
This "feels" similar, we could use a tag to represent We don't have to use tags, but either way I think it would be nice to One other thing I thought of, want to make sure we don't accidentally |
I like the idea of making it easy to reindex data from event.original.
With our pipelines all expecting the original value to be contained in |
I agree with both as well, just to try to sum it up:
Example on adding the configuration option to the UI:
In the agent configuration, to send the configuration object, we will use the tag preserve_original_event
In ingest pipelines, depending on which source field we use, let's say currently is
|
Then I think this discussion should be happening in in |
What we consider to be the |
Relates: elastic/ecs#841 |
@ruflin I think in this context security/evidence/audit requirements suggest that "original" = the message as recorded by the original entity - before any beats/agent/ingest pipeline modifications. (unsure if agent/integrations are doing any local processing, but the distinction, even from a docs perspective is likely important to those who are thinking of a secured log destination |
Consistency across integrations on how we handle original data makes sense and so does providing an option to users to turn on and off copying of the original event content. From my perspective, the important question is whether to turn it on or off by default. What are the trade offs of between the two options? Turning off reduces storage by default, but how much, do we have any quantitative data to support that @jamiehynds Also what happens to any custom processing at edge or custom ingest pipelines users have written to write against the event.original? |
@mukeshelastic I don't have quantitative data to illustrate the impact on storage, however our Security Specialists have regularly encountered storage issues relating to event.orginal during POC's. @NeilADesai - do you happen to have quantitive data from customer's environments? |
For implementation I'm wondering if we should have something like this at the beginning of the pipeline?
have the pipeline work on
I think that preserves the assumption that |
One additional thing that literally just popped up in my head - we should keep original messages for any message we do not FULLY parse (example : filebeat / palo alto / global protect) Unsure what the overall mechanism should be (adding a tag if a specific message is a supported / parsed, etc) but even if we are parsing portions of the message, given the variations of cisco alone - let alone different versions, etc. we need to be careful here from a security / logging / POC perspective I think defaulting to keeping event.original would be a good idea |
@dainperkins brought up a good point in the ECS meeting, we need to make sure we don't delete |
I think @dainperkins point makes this not a good idea, needs a bit more logic. |
I do think the logic is something we could leave for later though, as long as we can agree on where the logic is to be placed (in Ingest Pipelines) and that we want to preserve |
Hey all, Adding my 2c to this conversation:
"Parsed" ECS data index contains something like an |
Great ideas @jamesspi, particularly with e.g. a hash of e.g. event.message & timestamp for integrity & immediate blob storage for archival chain of custody |
Another aspect to the discussion that hasn't been explicitly is the perspective of the UIs and users consuming the documents. In the Logs UI we've been struggling with beats modules that don't preserve |
@weltenwort would it make sense to have the logs ui only show those docs with an original message field (defaulting to e.g. event.message or log.original) with a note in beat/apm configs that the original message preservation flag will have an impact on what is available in that UI? Feels like 2 typical use cases (e.g. APM & unnormalized logs where the full message is needed and beneficial vs. e.g. well normalized security logs where the costs associated with 2x the storage may be prohibitive...) |
The
With my comment I wanted to caution against letting an orthogonal concern like storage size optimization interfere with normal field semantics. The So in that sense I would agree that a strong warning seems appropriate for a setting that essentially opts out of having such a human-readable message available. |
What's the latest on this situation? I see we are moving forward with moving from I wonder if we might consider "message" becoming a byte/character-limited field to keep its size impact under control but maintain its human-readability usefulness (letting the module/package decide how to handle truncation etc) and then the original would contain the full message. This might be too hard to change from a backwards compatibility perspective, but it'd be a nice middle ground for those who need to free up space without losing all the value of "message". Alternatively, packages could ship a "message reconstruction" template per |
🤔 Ideally the packages would perform the message reconstruction themselves using a |
@jasonrhodes @weltenwort The implementation is done and tracked here: #994 There will be a UI option for the user called "Preserve Raw Event". All non-metric packages will use this, and will all support this in 7.14. |
Closing this as Marius has implemented the |
@P1llus I'm curious how we plan to keep this enforced / consistent across all log data streams. Should we add some validation at the package-spec level for this feature? |
Our current integrations are inconsistent when it comes to preserving original logs/fields. Some integrations preserve event.original, while others do not. Preserving raw logs has a significant impact on storage, often doubling the size of an event.
While there are cases whereby preservation of raw logs is a requirement, most users prefer to keep their storage costs as low as possible. Disabling event.original by default, but adding the option to enable, seems like the a reasonable solution.
Could we add a switch to our Fleet packages (not Beat modules) to allow some optionality on the preservation of original events.
Related issue: elastic/beats#14708
The text was updated successfully, but these errors were encountered: