Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ECS-Compatibility Mode #924

Closed
kares opened this issue Feb 27, 2020 · 3 comments
Closed

Implement ECS-Compatibility Mode #924

kares opened this issue Feb 27, 2020 · 3 comments
Assignees

Comments

@kares
Copy link
Contributor

kares commented Feb 27, 2020

This is a stub issue, and needs to be fleshed out with details specific to this plugin.


As a part of the effort to make plugins able to run in an ECS-Compatible manner by default in an upcoming release of Logstash, this plugin needs to either implement an ECS-Compatibility mode or certify that it does not implicitly use fields that conflict with ECS.


  • default template will need to be updated to be ECS compliant, geoip data (see also geoip filter) :
"geo" : { # previosly "geoip"
  "dynamic": true,
  "properties" : {
    "ip": { "type": "ip" }, # geo.ip not defined in ECS
    "location" : { "type" : "geo_point" }
    # "latitude" : { "type" : "half_float" }, 
    #     ^^^ already part of location.lat
    # "longitude" : { "type" : "half_float" }
  }
}
@yaauie
Copy link
Contributor

yaauie commented Jul 9, 2020

It has been recommended to me that we use the generated ECS templates from ECS directly:

https://github.com/elastic/ecs/tree/master/generated/elasticsearch

From what I can tell, the only real obstacle here is that the generated templates apply to indexes matching the pattern ecs-*, and this Output Plugin (a) defaults to logstash-XYZ and (b) allows configuration to derive the index name from the event's payload.

Since this is ECS, and the source of the data is less important than the schema, I would move in favor of doing the following when an ECS Compatibility Mode is enabled:

  • using the generated ECS template for the appropriate Elasticsearch version (which we would vendor here and somehow keep in sync)
  • validating the plausibility of events targeting an index that will match:
    • when index is either hard-coded or a format-string with constant prefix and will never match ecs-*, reject it
    • when index is a format-string that starts with a sprintf placeholder, warn that events may not be routed to an index matching the managed template.

Alternatively, we can provide a default index when in ECS mode to ecs-logstash-%{+yyyy.MM.dd}, and re-map the generated template to match ecs-logstash-* before persisting it.

@yaauie yaauie closed this as completed Jul 9, 2020
@yaauie yaauie reopened this Jul 9, 2020
@kares
Copy link
Contributor Author

kares commented Jul 10, 2020

this is starting to make more and more sense (to me) - believe we definitely want to start with the generated ecs template and automate any tweaks as suggested.

when index is either hard-coded or a format-string with constant prefix and will never match ecs-*, reject it

confused about the rejection, does that mean we refuse to write to any hard-coded index that does not use the ecs-* prefix (in ecs mode), why not simply warn here as well? is it because ECS is 'strongly' opinionated about the index naming?

@yaauie
Copy link
Contributor

yaauie commented Jul 10, 2020

confused about the rejection

The goal is to prevent people from accidentally getting into sticky situations that are very hard to get out of, which we can do by rejecting a configuration that we know will not work as specified (refusing to start up the plugin).

Because Elasticsearch creates indexes and fields upon first encounter from a best-guess using the shape of the data and any present templates, the absence of a matching template when the user expects there to be one can mean that a field is created incorrectly. Elasticsearch will then attempt to coerce subsequent encounters of the field to the originally-guessed type, but that can result in indexing rejections (e.g., first encounter long, subsequent is date) or loss of precision (e.g., first encounter is long, but subsequent is float). And because the field types in an Elasticsearch index are immutable, these types of issues require complete reindexing to fix.

I think it's better to prevent people from getting in these situations in the first place.

I'm okay with backing down to a warning (@jsvd had a similar recommendation), but the proposed idea was for the plugin to refuse to start up when configured to manage a template that it would not possibly be able to match -- a situation that can be resolved in one of three ways:

  • by setting manage_template => false; OR
  • by setting index to a value that hard-codes a prefix that guarantees events will target indices managed by the template; OR
  • by setting index to a value that is wholly derived from the event (we cannot determine at plugin startup whether events will match the template, so this should produce a warning instead)

@yaauie yaauie closed this as completed in e7116f4 Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants