Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

event.original optionality across all packages #777

Closed
jamiehynds opened this issue Dec 2, 2020 · 26 comments
Closed

event.original optionality across all packages #777

jamiehynds opened this issue Dec 2, 2020 · 26 comments

Comments

@jamiehynds
Copy link

Our current integrations are inconsistent when it comes to preserving original logs/fields. Some integrations preserve event.original, while others do not. Preserving raw logs has a significant impact on storage, often doubling the size of an event.

While there are cases whereby preservation of raw logs is a requirement, most users prefer to keep their storage costs as low as possible. Disabling event.original by default, but adding the option to enable, seems like the a reasonable solution.

Could we add a switch to our Fleet packages (not Beat modules) to allow some optionality on the preservation of original events.

Related issue: elastic/beats#14708

@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@NeilADesai
Copy link

This should also go for original event fields (i.e. CEF module). We should not have duplicate fields by default.

@jamiehynds jamiehynds changed the title Add option to preserve event.original Add option to preserve original fields Dec 8, 2020
@jamiehynds jamiehynds changed the title Add option to preserve original fields event.original optionality across all modules Mar 3, 2021
@jamiehynds jamiehynds changed the title event.original optionality across all modules event.original optionality across all packages Mar 5, 2021
@P1llus P1llus self-assigned this Mar 5, 2021
@leehinman
Copy link
Contributor

We currently use a tag of "forwarded" to control if host info is
disabled:

This "feels" similar, we could use a tag to represent
"preserve_original_event".

We don't have to use tags, but either way I think it would be nice to
have one way in which we add data to an event to control it's
processing. I also think it would be good to preserve this value so
the data could be re-indexed and get the same effect.

One other thing I thought of, want to make sure we don't accidentally
increase network bandwith by copying to event.original before we send
to ingest node.

@andrewkroh
Copy link
Member

I like the idea of making it easy to reindex data from event.original.

I also think it would be good to preserve this value so
the data could be re-indexed and get the same effect.

With our pipelines all expecting the original value to be contained in message, using _reindex won't work as expected since the message value taken from _source will be the modified non-original value. It would be nice if Filebeat always set event.original rather than message. I guess pipelines could be made to use event.original if it exists, otherwise use message, but I would prefer invariant logic.

@P1llus
Copy link
Member

P1llus commented Mar 9, 2021

I agree with both as well, just to try to sum it up:

  • Change is only to be applied to packages, this includes all packages from both Observability and Security.
  • This does not include special packages like endpoint and APM.
  • The field to be used to preserve the event, is event.original.
  • Instead of copying message to event.original, for reindex purposes the pipelines should be rewritten to use event.original instead. This makes the reindex experience much smoother.
  • Each package will have a configuration option in the Fleet menu, asking for Preserve original event, which defaults to off.
  • If Preserved original event is set to true by the user, a tag is applied which is called preserve_original_event
  • If the tag is set, the ES Ingest Pipeline will be used to preserve the event, instead of local processing by the beat/agent, as that will be removed.
  • event.original should not be a indexed field.

Example on adding the configuration option to the UI:

    vars:
      - name: preserve_original_event
        required: true
        show_user: true
        title: Preserve original event
        description: Preserves a raw copy of the original event, added to the field event.original
        type: bool
        multi: false
        default: false

In the agent configuration, to send the configuration object, we will use the tag preserve_original_event

tags:
{{#if preserve_original_event}}
- preserve_original_event
{{/if}}
{{#each tags as |tag i|}}
 - {{tag}}
{{/each}}

In ingest pipelines, depending on which source field we use, let's say currently is message that we want to preserve

  - set:
      field: event.original
      copy_from: message
      if: "ctx?.tags.contains('preserve_original_event')"

@andrewkroh
Copy link
Member

Change is only to be applied to packages, this includes all packages from both Observability and Security.

Then I think this discussion should be happening in in elastic/integrations. Can you transfer this issue over to there.

@P1llus P1llus transferred this issue from elastic/beats Mar 9, 2021
@ruflin
Copy link
Member

ruflin commented Mar 10, 2021

What we consider to be the original even in this context is the message field when it arrives at the ingest pipeline? In general I like this idea. As @andrewkroh mentioned, this makes reindexing really simple.

@andrewkroh
Copy link
Member

Relates: elastic/ecs#841

@dainperkins
Copy link

@ruflin I think in this context security/evidence/audit requirements suggest that "original" = the message as recorded by the original entity - before any beats/agent/ingest pipeline modifications. (unsure if agent/integrations are doing any local processing, but the distinction, even from a docs perspective is likely important to those who are thinking of a secured log destination

@mukeshelastic
Copy link

Consistency across integrations on how we handle original data makes sense and so does providing an option to users to turn on and off copying of the original event content. From my perspective, the important question is whether to turn it on or off by default. What are the trade offs of between the two options? Turning off reduces storage by default, but how much, do we have any quantitative data to support that @jamiehynds Also what happens to any custom processing at edge or custom ingest pipelines users have written to write against the event.original?

@jamiehynds
Copy link
Author

jamiehynds commented Mar 10, 2021

@mukeshelastic I don't have quantitative data to illustrate the impact on storage, however our Security Specialists have regularly encountered storage issues relating to event.orginal during POC's. @NeilADesai - do you happen to have quantitive data from customer's environments?

@leehinman
Copy link
Contributor

For implementation I'm wondering if we should have something like this at the beginning of the pipeline?

- rename:
    field: message
    target_field: event.original
    if: "message != null && event?.original == null"

have the pipeline work on event.original, and at the end

- drop:
    field: event.original
    if: "!ctx?.tags.contains('preserve_original_event')"

I think that preserves the assumption that message contains the info on original ingest, but allows re-indexing to work off event.original

@dainperkins
Copy link

One additional thing that literally just popped up in my head - we should keep original messages for any message we do not FULLY parse (example : filebeat / palo alto / global protect)

Unsure what the overall mechanism should be (adding a tag if a specific message is a supported / parsed, etc) but even if we are parsing portions of the message, given the variations of cisco alone - let alone different versions, etc. we need to be careful here

from a security / logging / POC perspective I think defaulting to keeping event.original would be a good idea

@leehinman
Copy link
Contributor

@dainperkins brought up a good point in the ECS meeting, we need to make sure we don't delete event.original if we weren't able to do any processing. So if we don't specifically know how to process the event we shouldn't drop it.

@leehinman
Copy link
Contributor

For implementation I'm wondering if we should have something like this at the beginning of the pipeline?

- rename:
    field: message
    target_field: event.original
    if: "message != null && event?.original == null"

have the pipeline work on event.original, and at the end

- drop:
    field: event.original
    if: "!ctx?.tags.contains('preserve_original_event')"

I think that preserves the assumption that message contains the info on original ingest, but allows re-indexing to work off event.original

I think @dainperkins point makes this not a good idea, needs a bit more logic.

@P1llus
Copy link
Member

P1llus commented Mar 10, 2021

For implementation I'm wondering if we should have something like this at the beginning of the pipeline?

- rename:
    field: message
    target_field: event.original
    if: "message != null && event?.original == null"

have the pipeline work on event.original, and at the end

- drop:
    field: event.original
    if: "!ctx?.tags.contains('preserve_original_event')"

I think that preserves the assumption that message contains the info on original ingest, but allows re-indexing to work off event.original

I think @dainperkins point makes this not a good idea, needs a bit more logic.

I do think the logic is something we could leave for later though, as long as we can agree on where the logic is to be placed (in Ingest Pipelines) and that we want to preserve event.original no matter what, which is a very good point.
In terms of the latter, we just need to make sure we perform the copy to event.original as the first action done in the pipeline, ref my comment above, using set instead.

@jamesspi
Copy link

Hey all,

Adding my 2c to this conversation:

  • If we figure out a clever way to cover @dainperkins's point, I think event.original should be "opt-in"
  • When a user does "opt-in", I also think that we should consider putting it in its own index. Why? So it becomes the untouched/unprocessed "system of record"/"chain of custody" index, potentially with a different ILM policy. It can probably also go straight to cold storage. One thing we may want to do to help users who need to reference original/parsed docs is add a field with the _id to each index. In other words:

"Parsed" ECS data index contains something like an event.original.doc._id field, "preserved" event.original index contains something like parsed.doc_id.

@dainperkins
Copy link

Great ideas @jamesspi, particularly with e.g. a hash of e.g. event.message & timestamp for integrity & immediate blob storage for archival chain of custody

@weltenwort
Copy link
Member

Another aspect to the discussion that hasn't been explicitly is the perspective of the UIs and users consuming the documents. In the Logs UI we've been struggling with beats modules that don't preserve message from the beginning, which lead to elastic/beats#14708. IMHO a well-rounded discussion about mandatory fields and storage size should take into account that having a consistent (and backwards-compatible) message field on log entry documents is important for making the log entry consumable by applications and users.

@dainperkins
Copy link

@weltenwort would it make sense to have the logs ui only show those docs with an original message field (defaulting to e.g. event.message or log.original) with a note in beat/apm configs that the original message preservation flag will have an impact on what is available in that UI? Feels like 2 typical use cases (e.g. APM & unnormalized logs where the full message is needed and beneficial vs. e.g. well normalized security logs where the costs associated with 2x the storage may be prohibitive...)

@weltenwort
Copy link
Member

The message ECS field seems to be exactly what should be displayed in UIs. The specs explicitly call it out:

For log events the message field contains the log message, optimized for viewing in a log viewer.

For structured logs without an original message field, other fields can be concatenated to form a human-readable summary of the event.

With my comment I wanted to caution against letting an orthogonal concern like storage size optimization interfere with normal field semantics. The message was never designed to be "original", but to be "useful to human readers". If anything, we want more shippers and packages to populate the message field by default (elastic/beats#14708).

So in that sense I would agree that a strong warning seems appropriate for a setting that essentially opts out of having such a human-readable message available.

@jasonrhodes
Copy link
Member

jasonrhodes commented May 27, 2021

What's the latest on this situation? I see we are moving forward with moving from log.original to event.original, but I'm curious about how we manage including this information vs dropping it. To @weltenwort's point about human readability, I think having a value in "message" is important for lots of discoverability use cases as people begin to install more and more packages via Agent.

I wonder if we might consider "message" becoming a byte/character-limited field to keep its size impact under control but maintain its human-readability usefulness (letting the module/package decide how to handle truncation etc) and then the original would contain the full message. This might be too hard to change from a backwards compatibility perspective, but it'd be a nice middle ground for those who need to free up space without losing all the value of "message".

Alternatively, packages could ship a "message reconstruction" template per event.dataset (do we already have this concept)?

@weltenwort
Copy link
Member

🤔 Ideally the packages would perform the message reconstruction themselves using a message runtime field. Then it would be opaque to the UI whether that message is actually persisted as such.

@P1llus
Copy link
Member

P1llus commented May 31, 2021

@jasonrhodes @weltenwort The implementation is done and tracked here: #994

There will be a UI option for the user called "Preserve Raw Event". All non-metric packages will use this, and will all support this in 7.14.

@jamiehynds
Copy link
Author

Closing this as Marius has implemented the event.original toggle across all integrations via #994

@joshdover
Copy link
Contributor

@P1llus I'm curious how we plan to keep this enforced / consistent across all log data streams. Should we add some validation at the package-spec level for this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests