Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] : unable to set field_delimiter_regex #2946

Closed
efloresb-tibco opened this issue Jun 27, 2023 · 4 comments · Fixed by #4358
Closed

[BUG] : unable to set field_delimiter_regex #2946

efloresb-tibco opened this issue Jun 27, 2023 · 4 comments · Fixed by #4358
Assignees
Labels
bug Something isn't working
Milestone

Comments

@efloresb-tibco
Copy link

efloresb-tibco commented Jun 27, 2023

Describe the bug
While trying to set field_delimiter_regex, I see the following error Caused by: java.lang.IllegalArgumentException: field_delimiter_regex and field_split_characters cannot both be defined., I am not setting field_split_characters, I assume it is using the default one.

To Reproduce

  1. Use the configuration mentioned in the documentation
    field_delimiter_regex: "&\\{2\\}"
  2. Star the docker container
docker run --rm --name data-prepper -p 2021:2021-v $(pwd)/pipeline.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v $(pwd)/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml -v data-prepper-out:/usr/share/log opensearchproject/data-prepper:latest
  1. See error
Caused by: java.lang.IllegalArgumentException: field_delimiter_regex and field_split_characters cannot both be defined.
	at org.opensearch.dataprepper.plugins.processor.keyvalue.KeyValueProcessor.<init>(KeyValueProcessor.java:45) ~[key-value-processor-2.2.1.jar:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
	at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
	at org.opensearch.dataprepper.plugin.PluginCreator.newPluginInstance(PluginCreator.java:40) ~[data-prepper-core-2.2.1.jar:?]
	... 35 more

Pipeline config

log-pipeline:
  source:
    http:
  processor:
    - key_value:
        field_delimiter_regex: "&\\{2\\}"
        source: "log"
  sink:
    - file:
        path: /usr/share/log/output-file
@graytaylor0
Copy link
Member

This is a bug. The field_split_characters is given a default value of & here (

) and so it fails the validation here (https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/key-value-processor/src/main/java/org/opensearch/dataprepper/plugins/processor/keyvalue/KeyValueProcessor.java#L48).

You should be able to get around this until the bug is fixed by changing your config to

log-pipeline:
  source:
    http:
  processor:
    - key_value:
        field_split_characters: null
        field_delimiter_regex: "&\\{2\\}"
        source: "log"
  sink:
    - file:
        path: /usr/share/log/output-file

@bbgu1
Copy link

bbgu1 commented Oct 20, 2023

@graytaylor0 The workaround doesn't work, as field_split_characters is set to be @NotEmpty. Setting it to null or empty string causes the plug-in initialization to fail with error "fieldSplitCharacters must not be empty". So I am blocked and can't see a way to use field_delimiter_regex at all.

Why not just set the field_delimiter_regex to take precedence, i.e. ignoring the field_split_characters when a regex is configured?

@graytaylor0 graytaylor0 added this to the v2.6 milestone Oct 20, 2023
@efloresb-tibco
Copy link
Author

efloresb-tibco commented Oct 20, 2023

@bbgu1, I used substitute_string processor as a workaround and then the key_value, idk if that helps your scenario.

    - substitute_string:
          entries:
            - source: "log"
              from: \s(?=[a-z\\_\\-]+=)
              to: "\u0007"
    - key_value:
        source: "log"
        destination: "parsed-log"
        field_split_characters: "\u0007"
        delete_value_regex: "\""

@dlvenable dlvenable modified the milestones: v2.6, v2.6.1 Nov 14, 2023
@dlvenable dlvenable modified the milestones: v2.6.1, v2.6.2 Dec 7, 2023
@dlvenable dlvenable modified the milestones: v2.6.2, v2.7 Jan 11, 2024
@dlvenable dlvenable modified the milestones: v2.7, v2.8 Jan 30, 2024
@dlvenable
Copy link
Member

PR to resolve this:

#4358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
5 participants