Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Mapping Updates Through OSIS Pipeline Configuration YAML #5038

Open
bircpark opened this issue Oct 9, 2024 · 5 comments
Open

Index Mapping Updates Through OSIS Pipeline Configuration YAML #5038

bircpark opened this issue Oct 9, 2024 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bircpark
Copy link

bircpark commented Oct 9, 2024

Is your feature request related to a problem? Please describe.
Currently an OSIS pipeline seems to require either manual intervention or downtime to be taken when updating the mappings for an index, this includes adding subfields to an already existing field or a brand new field entirely.

Existing Configuration For Mapping

"field_name": {
    "type": "text",
    "fields": {
       "keyword": {
          "type": "keyword"
       }
    }
}

New Configuration For Mapping

"field_name": {
    "type": "text",
    "fields": {
        "english": {
              "type": "text",
              "analyzer": "english"
            },
         "keyword": {
            "type": "keyword"
       }
    }
}

The english subfield is not shown in the cluster and requires downtime or manual changes to be used.

Describe the solution you'd like
It would be nice for OSIS pipelines to have the ability to update index mappings when they are updated in configuration. Once the updates are made having something like an update_by_query call or something similar to populate the new fields.

Describe alternatives you've considered (Optional)
a) A manual change to the mapping with an invocation of the update_by_query API to backfill records
b) Take some downtime to stop the pipeline, delete the index, then restart the pipeline to re-sync data

Additional context
The solution suggested is mainly concerned with updating subfields as update_by_query will only populate subfields of already existing fields and won't work for brand new fields being introduced to the mapping. For entirely new fields to the mapping you would need to run something else run (maybe like a Glue Job) to have the documents update reliably.

@dlvenable
Copy link
Member

This partially depends on #973. But, it would also need an ability to update the OpenSearch index.

@dlvenable dlvenable added enhancement New feature or request help wanted Extra attention is needed and removed untriaged labels Oct 15, 2024
@dlvenable
Copy link
Member

@bircpark , What source are you using in this case?

@bircpark
Copy link
Author

@dlvenable, My source is Dynamo DB using the Zero-ETL pipeline integration.

@dlvenable
Copy link
Member

Based on my understanding, the ask here is for Data Prepper to make a call to PUT <index>/_mapping to update the actual mappings file based on the user-defined input. This will allow modifications to an existing index as new fields are added.

@bircpark
Copy link
Author

Yes that is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Development

No branches or pull requests

2 participants