Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Function which converts slice into map #35256

Closed
djaglowski opened this issue Sep 17, 2024 · 18 comments
Closed

[pkg/ottl] Function which converts slice into map #35256

djaglowski opened this issue Sep 17, 2024 · 18 comments
Labels
discussion needed Community discussion needed enhancement New feature or request pkg/ottl

Comments

@djaglowski
Copy link
Member

djaglowski commented Sep 17, 2024

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

I often hear from users that it's difficult to work with slices in OTTL and/or backends. Issues such as #29289 may alleviate this to some extent, but I believe in many cases users would prefer to work around slices by converting them to maps. This obviously comes with some assumptions, but I think there may be a reasonable way to provide this functionality.

Describe the solution you'd like

A new function called Associate, which requires two parameters:

  1. A path to an array.
  2. A "sub-path", defined by a slice of strings, which traverses to the desired key in each element.

e.g. Given a log record with the following attributes:

attributes:
  hello: world
  things:
    - name: foo
      value: 2
    - name: bar
      value: 5

A user could write something like Associate(attributes["things"], ["name"]) and expect the following output:

attributes:
  hello: world
  things:
    foo:
      name: foo
      value: 2
    bar:
      name: bar
      value: 5

A slightly more complicated example highlights the notion of a sub-path. I'm not sure if OTTL has a mechanism for this, but if the array contains more complicated values, it may be necessary to drill further into the object to find its key.

attributes:
  hello: world
  things:
    - value: 2
      details:
        name: foo
    - value: 5
      details:
        name: bar

In the above case, Associate(attributes["things"], ["details", "name"]) would produce:

attributes:
  hello: world
  things:
    foo:
      value: 2
      details:
        name: foo
    bar:
      value: 5
      details:
        name: bar

It may make sense for users to specify, using an optional boolean, whether they wish to delete the key field, since it becomes redundant. (Or maybe this is just the default/only behavior.) Using the previous example, we'd expect:

attributes:
  hello: world
  things:
    foo:
      value: 2
      details: {}
    bar:
      value: 5
      details: {}

Finally, this comes with some caveats about uniqueness of keys. I think since arrays are ordered it is simple enough to say that in the case of duplicates, the first or last one wins.

Describe alternatives you've considered

No response

Additional context

No response

@djaglowski djaglowski added enhancement New feature or request needs triage New item requiring triage labels Sep 17, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member Author

Another option would be useful as well, to allow the value to simultaneously be specified via a sub-path.

attributes:
  hello: world
  things:
    - name: foo
      details:
        value: 2
    - name: bar
      details:
        value: 5

Associate(attributes["things"], ["name"], ["details", "value"] ) would produce:

attributes:
  hello: world
  things:
    foo: 2
    bar: 5

@bacherfl
Copy link
Contributor

I would gladly work on a PR if this function should be added

@gantrior
Copy link

I would be glad if this gets implemented

@bacherfl
Copy link
Contributor

I have created a draft PR now to outline how this could be implemented. Will keep it in draft mode for now, until the code owners have agreed that this should be added

@TylerHelmuth
Copy link
Member

I'm not totally clear on what problems we're solving, but I acknowledge that this transformation could be possible.

I'd like to talk some more about edge cases:

  • What happens with nested slices?
  • What happens if the slice is not a []map[string]any?
  • What is the associated field has a value that isn't a string?

I also think the name needs to be more direct. Associate is not self-descriptive enough in my opinion. Something like SliceToMap is clearer.

@TylerHelmuth TylerHelmuth added discussion needed Community discussion needed and removed needs triage New item requiring triage labels Sep 25, 2024
@djaglowski
Copy link
Member Author

djaglowski commented Sep 25, 2024

I'm not totally clear on what problems we're solving

Here's an example of where this would be useful. Windows Event Logs contain content like the following:

    <EventData>
        <Data Name='SubjectUserSid'>S-1-0-0</Data>
        <Data Name='SubjectUserName'>-</Data>
        <Data Name='SubjectDomainName'>-</Data>
        <Data Name='SubjectLogonId'>0x0</Data>
        <Data Name='TargetUserSid'>S-1-0-0</Data>
        <Data Name='TargetUserName'>mr_rogers</Data>
        <Data Name='TargetDomainName'>-</Data>
        <Data Name='Status'>0xc000006d</Data>
        <Data Name='FailureReason'>%%2313</Data>
        <Data Name='SubStatus'>0xc0000064</Data>
        <Data Name='LogonType'>3</Data>
        <Data Name='LogonProcessName'>NtLmSsp </Data>
        <Data Name='AuthenticationPackageName'>NTLM</Data>
        <Data Name='WorkstationName'>D-508</Data>
        <Data Name='TransmittedServices'>-</Data>
        <Data Name='LmPackageName'>-</Data>
        <Data Name='KeyLength'>0</Data>
        <Data Name='ProcessId'>0x0</Data>
        <Data Name='ProcessName'>-</Data>
        <Data Name='IpAddress'>194.169.175.45</Data>
        <Data Name='IpPort'>0</Data>
    </EventData>

We could debate which way this should be parsed but one reasonable representation would be a slice of "Data" maps, where each contains a key "Name" and a value.

Given such a scenario, it is very difficult to work with the slice in OTTL. I cannot for example refer directly to the "ProcessId" field unless I know exactly where to expect it in the slice, and even then this would be fragile.

Converting this to a map allows me to refer to it in a standard way, e.g. attributes["EventData"]["ProcessID"]

In practice, folks at my company have run into many similar situations. As you noted, not all cases can be solved by this because slices may contain varying data. Still, I would think that a function like this could be very useful without being fragile.


What happens if the slice is not a []map[string]any?

I would think this is a requirement. The function could validate this before doing anything and then nop if not satisfied.

What happens with nested slices?

I think this is covered if it must be []map[string]any

What is the associated field has a value that isn't a string?

It can probably be a requirement. In theory we could tostring some other types but I doubt that would be very useful.

I also think the name needs to be more direct. Associate is not self-descriptive enough in my opinion. Something like SliceToMap is clearer.

Seems reasonable to me.

@bacherfl
Copy link
Contributor

@djaglowski what would you say makes more sense in case one of the slice items is not a []map[string]any, or the path to the key or value can not be resolved? Either

a) Return an error and exit the function
b) Ignore the current item and proceed with attempting to convert the remaining slice items

In my draft I went with option a for now, but looking at it again, there also might be arguments for option b.

@djaglowski
Copy link
Member Author

I can see users wanting either option, but personally I would typically prefer a best effort transformation.

There may be subsequent statements which depend on a struct, or on a particular key/value in the struct which could be available in spite of the error.

For example, say [ { "k": "foo", "v": "one" }, "invalid", { "k": "bar", "v": "two" } ] is the input. If the function returns { "foo": "one", "bar": "two" } (and say I assign it to the body), then set(attribute["foo"], body["foo"]) still works.

I think there would be a wide variety of use cases here, with some being broken by any error, and others being fairly resilient.
I know the transform processor has a setting called error_mode which gives the user control over this, and if I'm understanding correctly, the the ignore setting is basically intended for best effort, which implies to me that the functions should make a best effort at all times but also return the error.

@bacherfl
Copy link
Contributor

I can see users wanting either option, but personally I would typically prefer a best effort transformation.

There may be subsequent statements which depend on a struct, or on a particular key/value in the struct which could be available in spite of the error.

For example, say [ { "k": "foo", "v": "one" }, "invalid", { "k": "bar", "v": "two" } ] is the input. If the function returns { "foo": "one", "bar": "two" } (and say I assign it to the body), then set(attribute["foo"], body["foo"]) still works.

I think there would be a wide variety of use cases here, with some being broken by any error, and others being fairly resilient. I know the transform processor has a setting called error_mode which gives the user control over this, and if I'm understanding correctly, the the ignore setting is basically intended for best effort, which implies to me that the functions should make a best effort at all times but also return the error.

Thanks @djaglowski, that makes sense - I will adapt the implementation accordingly

@djaglowski
Copy link
Member Author

@bacherfl, although we still need buy in, the implementation looks good to me.

@TylerHelmuth
Copy link
Member

@evan-bradley curious on your thoughts.

@evan-bradley
Copy link
Contributor

I'm okay adding this. The design for loops in OTTL is still unclear, and until then working with list-like structures will require functions like this that do it inside a single statement. I also think that while imperfect, this function would be a workable patch on top of some situations OTTL is currently unable to solve:

  1. A scenario very similar to the one @djaglowski mentioned: If I have a list of items and no knowledge about how they are ordered, I have no way of pulling out a particular item from that list without some very cumbersome workarounds. The only possible way of doing this off the top of my head would be to hardcode x nearly-identical statements checking for the item at array index 0..x, then move the value to the cache and work with it later if it is found.
  2. Converting a list to a map would require incredibly cumbersome workarounds like in (1). I think the example @djaglowski gave shows where this conversion would be useful for putting metadata coming in as a list (e.g. in a log body) into attributes on a telemetry item.

We should probably revisit the design and utility of this function once we have loops, but in the meantime I think we could probably make this roughly approximate loops for a reasonable number of use-cases. For the use-cases this doesn't solve, we can document where this function is limited, and solve those situations with proper loops later on.

@djaglowski
Copy link
Member Author

djaglowski commented Oct 17, 2024

If there's agreement, the PR is ready to review here: #35412

@djaglowski
Copy link
Member Author

@TylerHelmuth, any further thoughts on this?

@TylerHelmuth
Copy link
Member

I am ok moving forward with this concept since we haven't solved loops yet.

@Dylan-M
Copy link

Dylan-M commented Oct 30, 2024

If there's agreement, the PR is ready to review here: #35256

You linked back to this issue. Actual PR: #35412

djaglowski added a commit that referenced this issue Nov 4, 2024
**Description:** This PR adds a function that converts slices to maps,
as described in the linked issue. Currently still WIP, but creating a
draft PR already to show how this could be implemented and used

**Link to tracking Issue:** #35256 

**Testing:** Added unit and end to end tests

**Documentation:** Added description for the new function in the readme
file

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
@djaglowski
Copy link
Member Author

Closed by #35412

sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
**Description:** This PR adds a function that converts slices to maps,
as described in the linked issue. Currently still WIP, but creating a
draft PR already to show how this could be implemented and used

**Link to tracking Issue:** open-telemetry#35256 

**Testing:** Added unit and end to end tests

**Documentation:** Added description for the new function in the readme
file

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed Community discussion needed enhancement New feature or request pkg/ottl
Projects
None yet
Development

No branches or pull requests

6 participants