Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Set dynamic field mapping to false or runtime #128072

Open
ruflin opened this issue Mar 18, 2022 · 9 comments
Open

[Fleet] Set dynamic field mapping to false or runtime #128072

ruflin opened this issue Mar 18, 2022 · 9 comments
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@ruflin
Copy link
Member

ruflin commented Mar 18, 2022

Fleet with the package managers installs the package assets. Part of it is installing the Elasticsearch index templates. Elasticsearch by default has dynamic fields set to true: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html This was convenient to make sure all fields can be queried. But over the past few months, Elasticsearch introduced runtime fields: https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime.html This allows fields which were not index during ingestion to be still queried.

This issue should be used to discuss what the default setting should be in Fleet for dynamic and how we should handle the default block of dynamic_templates that is added to each template around keywords. Do we still need this?

As an example, here an excerpt mapping of the logs-system.security data stream:

    "mappings": {
      "_meta": {
        "package": {
          "name": "system"
        },
        "managed_by": "fleet",
        "managed": true
      },
      "dynamic_templates": [
        {
          "strings_as_keyword": {
            "match_mapping_type": "string",
            "mapping": {
              "ignore_above": 1024,
              "type": "keyword"
            }
          }
        }
      ],
      "date_detection": false,
      "properties": {
        "@timestamp": {
          "type": "date"
        },

Some previous work around optimising the template was done in: #104620

@ruflin ruflin added the Team:Fleet Team label for Observability Data Collection Fleet team label Mar 18, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@ruflin
Copy link
Member Author

ruflin commented Mar 18, 2022

@jpountz Would be great to get your take on the above as this might also have an effect on what we should do with the default templates in Elasticsearch.

@joshdover joshdover self-assigned this Apr 11, 2022
@jlind23 jlind23 assigned mukeshelastic and unassigned joshdover Feb 9, 2023
@jlind23
Copy link
Contributor

jlind23 commented Feb 9, 2023

@mukeshelastic assigning this to you as it has a Needs PM definition status.

@mukeshelastic
Copy link

@vinaychandrasekhar @mlunadia going to need your help understand the priority, use cases and their importance of the requested change before making the change..

@ruflin
Copy link
Member Author

ruflin commented Mar 10, 2023

Doing it by default for all packages might be considered a breaking change. Instead what we could do is making it a config option per dataset to move towards a feature where this becomes the default. (@felixbarny )

@felixbarny
Copy link
Member

I've had a discussion with @jpountz a while ago on whether we should consider changes to the default mappings a breaking change or not.

We could treat the absence of a more specialized index template as a way to mean "I don't care / I don't know, do whatever feels sensible" and set the expectation with users that built-in index templates may change at any time.

@jpountz response:

FWIW this would be my preferred option. We could discuss setting backward compatibility at a higher level by e.g. defining a set of queries and aggregations that we expect to work on mappings produced by default and ensuring that these queries/aggs keep working as we iterate on built-in mappings/settings. But setting strong bw compat guarantees on fields that built-in mappings produce is certainly a no-go as we wouldn't be able to introduce any improvement in a non-breaking way (e.g. improving detection of IP addresses would be breaking because the same field would suddenly get mapped from keyword to ip, which has different ordering rules).

@jpountz
Copy link

jpountz commented Mar 13, 2023

+1 to the above take. :)

Also if my understanding is correct, we're discussing the option to use dynamic: runtime here which is possibly a simpler case since the fields that will get dynamically mapped will have the same semantics (e.g. same ordering), only different performance. This helps reason about whether the change is breaking or not since it can then almost only be about performance, not different results to the same query like if we were to suddenly change some mappings from keyword to ip or vice-versa.

@joshdover
Copy link
Contributor

One thing to consider on performance is how this may affect the ability to do alerting. From my understanding, runtime fields in alerts is generally a no go due to how slow the query may be on a decently sized data set. At the very least, we may want to make sure that when a runtime field is used in an alert, there's helpful feedback if the query is too slow and the fix is to map the field.

@jpountz
Copy link

jpountz commented Mar 16, 2023

It's not the first time that alerting comes up as a reason why we're hesitating to move fields to runtime fields (for good reasons!). For reference, this is the main thing that made me want to enable Elasticsearch to make the decision about runtime vs. not, since it's already tracking field usage. We could map fields as runtime: auto or something like that and then use heuristics like "upon rollover, map the field as a runtime field in the new index if it's never been used at query time, and fully index it otherwise". elastic/elasticsearch#87469 It wouldn't be perfect as there would be some time between when a field starts getting used for querying and when it starts getting indexed, but mistakes would be less impactful than they could be today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

7 participants