Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new ottl function/editor to replace a regex pattern match with its hash value or digest #22787

Closed
rnishtala-sumo opened this issue May 25, 2023 · 16 comments · Fixed by #27235
Closed
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium

Comments

@rnishtala-sumo
Copy link
Contributor

rnishtala-sumo commented May 25, 2023

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

This function/editor would be used to replace substrings (identified with a regex pattern) with a hash value/digest

Describe the solution you'd like

There could be two approaches to this:

  • Re-using the replace_all_patterns editor

replace_all_patterns(target, mode, regex, sha1(replacement))
Please refer to: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl/ottlfuncs#replace_all_patterns

  • Defining a new function/editor

hash_all_patterns(target, mode, regex, sha1(replacement))
Which is similar to replace_all_patterns, but accepts a converter as the last argument.

Describe alternatives you've considered

None exist.

Additional context

#22725 (comment)

No response

@rnishtala-sumo rnishtala-sumo added enhancement New feature or request needs triage New item requiring triage labels May 25, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@rnishtala-sumo rnishtala-sumo changed the title A new ottl function to replace a regex pattern match with its hash value or digest A new ottl function/editor to replace a regex pattern match with its hash value or digest May 25, 2023
@TylerHelmuth TylerHelmuth added priority:p2 Medium and removed needs triage New item requiring triage labels May 25, 2023
@TylerHelmuth
Copy link
Member

OTTL has no way to accept a converter as a parameter, it will always execute the converter first and use the value returned. If we want to do either of the proposed strategies we'd need to update the grammar and the parser.

If we went with replace_all_patterns I think we'd first want to get #20879 resolved. This would allow us to pass the Converter as an optional parameter.

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented May 25, 2023

There currently seems to be one example of a function/editor that accepts a converter as a parameter as below

Example: merge_maps(attributes, ParseJSON(body), "upsert")

where ParseJSON is a converter

taken from: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl/ottlfuncs#merge_maps

OTTL has no way to accept a converter as a parameter, it will always execute the converter first and use the value returned. If we want to do either of the proposed strategies we'd need to update the grammar and the parser.

So can we not follow a similar pattern for a new hash function/editor?

@TylerHelmuth
Copy link
Member

Converts can be used in parameters, but they are executed before the main function is executed. In that example ParseJSON is executed and the resulting string is passed to the merge_maps function.

For this issue, in order for the capture group logic to work, the literal Converter function would need passed into the replace_all_patterns function so that it could be executed once the capture group can be expanded.

@rnishtala-sumo
Copy link
Contributor Author

Considering the above limitation with passing a converter function, could we do this instead for the first iteration?

hash_all_patterns(target, mode, regex, hashType)
where hashType could take the following values

  • sha1
  • sha256
  • fnv

hash_all_patterns would then call the appropriate ottl hash function, a bit like the convertCase function below:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/ottlfuncs/func_convert_case.go#L50

@TylerHelmuth
Copy link
Member

TylerHelmuth commented May 30, 2023

I would prefer to solve this via #20879, #22961 and replace_all_patterns. Adding such a similar function like hash_all_patterns will create more for us to maintain and will add one more function to the replace_* list that is already kinda confusing.

It is extra work, but finishing off those 2 issues first will provide the best end-user experience, reduce breaking changes, and reduce tech debt.

@TylerHelmuth
Copy link
Member

For extra clarity, resolving the 2 issues linked above will allow all existing users of replace_all_patterns to be unaffected but will also allow something like replace_all_patterns(attributes, "value", "^kube_([0-9A-Za-z]+_)", "k8s.$$1.", converter=SHA265)

The function already allows looking through all the attribute values for a specific pattern and then changing them dynamically based on those values; adding in extra manipulation to that action feels reasonable to enable flexible and complex transformations.

Open question: should the optional parameter accept a list of converters or can that functionality be handled via multiple statements? I believe accepting a list would be more efficient.

@TylerHelmuth
Copy link
Member

We should also highlight in the function's documentation that the extra converters only need to be used if you are using capture groups. If you are replacing patterns with static values you can use the existing converter logic: replace_all_patterns(attributes, "value", "^kube_([0-9A-Za-z]+_)", SHA265("my static value"))

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented May 31, 2023

Open question: should the optional parameter accept a list of converters or can that functionality be handled via multiple statements? I believe accepting a list would be more efficient

A list of converters could be useful, but what about the scenario below:

replace_all_patterns(attributes, "value", "^kube_([0-9A-Za-z]+_)", "k8s.$$1.", converter=[ConvertCase(target, case), Substring(target, start, length)] )

Both the converters accept different mandatory input parameters. Could this be an issue? We could then limit the list to only converters that take one input parameter, for example:

replace_all_patterns(attributes["duration_ms"], "value", "^time=([0-9]+) ms", "$$1", converter=[log, Int] )

In the above example we first apply log and then Int. Both only take one input parameter.

@TylerHelmuth
Copy link
Member

Ya those are definitely concerns that will need to be worked through for #22961. The ability to pass a convert to a function may need to be "typed" so that the functions accepting them as parameters can set expectations. Hopefully a strategy similar to StringGetter or StringLikeGetter can be used.

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Jun 6, 2023

Also currently, replace_pattern(target, regex, replacement) only allows a replacement string. Should we also allow replacement to be a path expression to a string telemetry field? For example

replace_pattern(attributes["log"], "^device=([0-9A-Za-z]+)", attributes["device_name"])

In the above, we're replacing the device name using an attribute field.

@TylerHelmuth
Copy link
Member

@rnishtala-sumo yes I think we can allow the replacement value to come from another attribute using StringGetter. That is also a non-breaking change.

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Jun 6, 2023

ok great, I can raise a PR for this soon, supporting this would allow the user to extract, hash(using ottl functions) and replace as an alternative, example below:

  attributes/extract_device_attribute:
    actions:
      - key: message
        pattern: "^device=(?P<device_name>\\w+)$"
        action: extract
  transform/replace:
    log_statements:
      - context: log
        statements:
          - set(attributes["device_name"], FNV(attributes["device_name"]))
          - replace_pattern(attributes["message"], "^device=([0-9A-Za-z]+)", attributes["device_name"])

The above could be used to hash a device name in a log like below
{"timestamp": "2022-12-23T12:34:56Z","level": "info","message": "device=dev12c","request_id": "1234567890","user_id": "abcdefghij"}

not ideal/hacky maybe, but would love to hear thoughts on this.

@rnishtala-sumo
Copy link
Contributor Author

rnishtala-sumo commented Jun 7, 2023

Created the following PR for review based on the above discussion #23210

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

evan-bradley pushed a commit that referenced this issue Oct 11, 2023
**Description:** Functions to modify matched text during replacement can
now be passed as optional arguments to the following Editors:
- replace_pattern
- replace_all_patterns
- replace_match
- replace_all_matches

**Documentation:**

https://github.com/rnishtala-sumo/opentelemetry-collector-contrib/blob/ottl-replace-pattern/pkg/ottl/ottlfuncs/README.md#replace_pattern

**Issue:**
Resolves
#22787
JaredTan95 pushed a commit to openinsight-proj/opentelemetry-collector-contrib that referenced this issue Oct 18, 2023
…etry#27235)

**Description:** Functions to modify matched text during replacement can
now be passed as optional arguments to the following Editors:
- replace_pattern
- replace_all_patterns
- replace_match
- replace_all_matches

**Documentation:**

https://github.com/rnishtala-sumo/opentelemetry-collector-contrib/blob/ottl-replace-pattern/pkg/ottl/ottlfuncs/README.md#replace_pattern

**Issue:**
Resolves
open-telemetry#22787
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…etry#27235)

**Description:** Functions to modify matched text during replacement can
now be passed as optional arguments to the following Editors:
- replace_pattern
- replace_all_patterns
- replace_match
- replace_all_matches

**Documentation:**

https://github.com/rnishtala-sumo/opentelemetry-collector-contrib/blob/ottl-replace-pattern/pkg/ottl/ottlfuncs/README.md#replace_pattern

**Issue:**
Resolves
open-telemetry#22787
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium
Projects
None yet
3 participants