Processor for parsing Amazon Ion documents #3730

emmachase · 2023-11-30T21:24:25Z

Is your feature request related to a problem? Please describe.
For my use-case I have nested ion documents in my input. For example:

{
  "event": "{id:\"foo...\", status: ACTIVE, timestamp: 2023-11-30T21:05:23.383Z, amount: dollars::100.0}"
}

I would like to parse these into fields so that I can index and search them in OpenSearch.

Describe the solution you'd like
A processor for parsing ion documents parse_ion, similar to parse_json, and csv.
The implementation would likely be very similar to parse_json, and perhaps under the hood they can share most of their logic, just supplying different ObjectMapper implementations for each as well as any language specific configurations.

Describe alternatives you've considered (Optional)
It's possible to preprocess simple well-formatted ion documents converting them to json in order to prepare them for parse_json using regular expressions (substitute_string), but this is hacky, probably slow, and very prone to bugs.

I have also considered creating a new intermediary service that converts the ion to json before submitting to data-prepper, but this adds additional complexity and just defeats the purpose of data-prepper in general.

Additional context
I'm willing to submit a PR for this, would like to get feedback on the idea & approach though.

The text was updated successfully, but these errors were encountered:

dlvenable · 2023-11-30T22:45:07Z

@emmachase , Thank you for this suggestion. I would suggest that we make this a new processor. This has the advantage of letting the configurations change if necessary. Perhaps we'd add certain configurations for looser parsing of one or the other. It would also be clearer for users who wouldn't look for ION processing in a JSON processor.

parse_ion:
  source: /ion-string
  destination: /data

And thank you for your interest in submitting a PR. We'd be happy to help get it merged in.

I think this could be easily accomplished by refactoring the ParseJsonProcessor class to make most of the logic go into a common class. And I'd be fine starting with the ParseIonProcessor in the same Gradle project (parse-json-processor) to keep it simple. Maybe we'd split it eventually to avoid unnecessary dependencies, but as it is all dependencies must deploy with Data Prepper.

I would suggest also having a different class for the configuration - ParseIonProcessorConfig. We recently did something similar in our kafka-plugins project where we decoupled the configurations for the Kafka buffer and source.

dlvenable · 2023-12-06T21:47:13Z

@emmachase , Thank you for the PR. This feature will be released in Data Prepper 2.7.0, currently scheduled for early 2024.

emmachase added the untriaged label Nov 30, 2023

github-project-automation bot added this to Data Prepper Tracking Board Nov 30, 2023

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Nov 30, 2023

dlvenable added plugin - processor A plugin to manipulate data in the data prepper pipeline. and removed untriaged labels Nov 30, 2023

This was referenced Dec 2, 2023

[DOC] parse_ion processor opensearch-project/documentation-website#5769

Closed

Add parse_ion processor #3803

Merged

dlvenable added this to the v2.7 milestone Dec 4, 2023

dlvenable closed this as completed in #3803 Dec 6, 2023

github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Dec 6, 2023

dlvenable assigned emmachase Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processor for parsing Amazon Ion documents #3730

Processor for parsing Amazon Ion documents #3730

emmachase commented Nov 30, 2023

dlvenable commented Nov 30, 2023

dlvenable commented Dec 6, 2023

Processor for parsing Amazon Ion documents #3730

Processor for parsing Amazon Ion documents #3730

Comments

emmachase commented Nov 30, 2023

dlvenable commented Nov 30, 2023

dlvenable commented Dec 6, 2023