Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Prepper support for dynamic renaming of keys #4849

Open
soghoyanaws opened this issue Aug 20, 2024 · 5 comments · May be fixed by #5074
Open

Data Prepper support for dynamic renaming of keys #4849

soghoyanaws opened this issue Aug 20, 2024 · 5 comments · May be fixed by #5074
Labels
enhancement New feature or request plugin - processor A plugin to manipulate data in the data prepper pipeline.
Milestone

Comments

@soghoyanaws
Copy link

Is your feature request related to a problem? Please describe.
In our DynamoDB (DDB) table, we have documents that have fields like this:

delivered_at|d87e56e8-f52f-474f-ad18-155b2a08f680: 1722622017.993797

Where the string behind the | is a random string (UUID) and the value is a float (representing a timestamp). We'd like to extract the value from this field in DDB and index it in OpenSearch as simply delivered_at: 1722622017.993797

Describe the solution you'd like
Is it possible to plugin custom code, that will

rename 'delivered_at|d87e56e8-f52f-474f-ad18-155b2a08f680: 1722622017.993797' to 'delivered_at: 1722622017.993797' ? 

Describe alternatives you've considered (Optional)
There is a rename_keys processor (https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/rename-keys/). But it currently can only handle static key names. So with this config, pipeline can rename the key, but if the uuid changes, rename won't work. processor: - rename_keys: entries: - from_key: "delivered_at|d87e56e8-f52f-474f-ad18-155b2a08f680" to_key: "delivered_at" > -

Tried a number of different configurations with the grok pipeline processor and none of them have worked. The challenge is that some of the pipeline processors support Pipeline Expressions, while others do not, and it's not well-documented which ones do and which ones don't.

Additional context
Add any other context or screenshots about the feature request here.

@dlvenable dlvenable added enhancement New feature or request plugin - processor A plugin to manipulate data in the data prepper pipeline. and removed untriaged labels Aug 27, 2024
@dlvenable
Copy link
Member

Thank you @soghoyanaws for raising this issue. If I understand the problem, the key itself has no well-defined name that we can use. Correct?

If so, we need to support mutating the key name more dynamically. The number of possible key names could be just as varied as the possible values in them. In this case it is a key that starts with a value. In other situations it may be a JSON string.

@soghoyanaws
Copy link
Author

Hi @dlvenable , correct, the key name is dynamic since it depends on UUID.

@sdhull
Copy link

sdhull commented Sep 13, 2024

Thanks for opening this issue for us @soghoyanaws (this is a problem from my company 😬). I personally would have liked to rename these keys but unfortunately that would be prohibitively expensive due to the size of the DynamoDB table.

Using regexp matching for target keynames in processors across the board would make all processors much more powerful. This something I was wishing for in many cases.

@dlvenable dlvenable changed the title Data Prepper custom plugin for OpenSearch Ingestion service Data Prepper support for dynamic renaming of keys Oct 3, 2024
@dlvenable dlvenable added this to the v2.10 milestone Oct 3, 2024
@dlvenable
Copy link
Member

Here is a possible solution using the existing rename_keys processor.

rename_keys:
  from_key_pattern: '^delivered_at|.*'
  to_key: delivered_at

Another option we may consider is the ability to rename using the captured group. This could allow us to support more dynamic solutions. For example, perhaps the key can start with either delivered_at or delivered.

rename_keys:
  from_key_pattern: "^(delivered_at|delivered)\|"
  to_key_pattern: "$1"

These two solutions do not need to be in conflict either. We could allow both. Either providing to_key for simple use cases or to_key_pattern for more complicated ones.

We are also considering using a different processor instead.

@sdhull
Copy link

sdhull commented Oct 3, 2024

@dlvenable that sounds like a great solution

@dlvenable dlvenable modified the milestones: v2.10, v2.11 Oct 7, 2024
@divbok divbok linked a pull request Oct 16, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugin - processor A plugin to manipulate data in the data prepper pipeline.
Projects
Development

Successfully merging a pull request may close this issue.

3 participants