Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/transform] Add Conversion Function to OTTL for Exponential Histo --> Histogram #33824

Merged
merged 70 commits into from
Sep 21, 2024
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
aa01984
added convert_exponential_hist_to_bucketed_hist function
daidokoro Jun 28, 2024
da0d514
added feature flag for new convert function
daidokoro Jun 28, 2024
2bb4c6d
added tests for convert_exponential_hist_to_bucketed_hist
daidokoro Jun 30, 2024
edcaa14
removed commented code
daidokoro Jun 30, 2024
7f8878e
add changelog
daidokoro Jul 1, 2024
487b418
Merge branch 'main' into cds-1320
daidokoro Jul 1, 2024
05d6a6b
renamed function
daidokoro Jul 1, 2024
e9d8aff
renamed and fixed test
daidokoro Jul 1, 2024
9ce3395
changed function name
daidokoro Jul 1, 2024
7fa713f
updated README
daidokoro Jul 1, 2024
1f20a5f
readme
daidokoro Jul 1, 2024
5e26f7d
added warning about usage
daidokoro Jul 1, 2024
ccbe496
Merge branch 'main' into cds-1320
daidokoro Jul 1, 2024
911673f
add more testx
daidokoro Jul 2, 2024
d80c080
updated function comments
daidokoro Jul 2, 2024
1ff58fa
Merge branch 'cds-1320' of github.com:coralogix/opentelemetry-collect…
daidokoro Jul 2, 2024
cdd8ee9
updated feature-gate flag name
daidokoro Jul 2, 2024
e23ef6f
Merge branch 'main' into cds-1320
daidokoro Jul 2, 2024
30e791e
Merge branch 'main' into cds-1320
daidokoro Jul 2, 2024
d4f9032
adjust feature gate description
daidokoro Jul 3, 2024
319b1f2
Merge branch 'main' into cds-1320
daidokoro Jul 3, 2024
4218b95
Merge branch 'main' into cds-1320
daidokoro Jul 3, 2024
1a8e731
Merge branch 'main' into cds-1320
daidokoro Jul 5, 2024
47cc0c4
Merge branch 'main' into cds-1320
daidokoro Jul 8, 2024
8baa474
fixed typos
daidokoro Jul 11, 2024
d0bf902
Merge branch 'cds-1320' of github.com:coralogix/opentelemetry-collect…
daidokoro Jul 11, 2024
d6cf406
added chloggen
daidokoro Jul 11, 2024
9e3cbfa
removed duplicate change log
daidokoro Jul 15, 2024
a78729b
updated readme with addition distribution approaches
daidokoro Jul 30, 2024
1b5558a
added random, uniform and midpoint distribution functions and updated…
daidokoro Jul 30, 2024
a260667
added tests for random, uniform and midpoint distribution implementat…
daidokoro Jul 30, 2024
ba8b742
added overflow assertion to random dist test
daidokoro Jul 31, 2024
c5db928
updated readme
daidokoro Jul 31, 2024
278b80b
Merge branch 'main' into cds-1320
daidokoro Jul 31, 2024
bfc4ed9
fix default
daidokoro Aug 7, 2024
5868330
update default
daidokoro Aug 7, 2024
a5e094f
removed featuregate
daidokoro Aug 8, 2024
12ad335
add function to test
daidokoro Aug 13, 2024
e50619b
updated tests
daidokoro Aug 13, 2024
c05bf13
- updated algorithms, fixed minor issues with accuracy of conversion
daidokoro Aug 13, 2024
7448835
updated readme
daidokoro Aug 13, 2024
be61c0d
Update processor/transformprocessor/README.md
daidokoro Aug 13, 2024
e159310
Merge branch 'main' into cds-1320
daidokoro Aug 13, 2024
f21fb3e
Merge branch 'main' into cds-1320
daidokoro Aug 15, 2024
e35ffd7
Merge branch 'main' into cds-1320
daidokoro Aug 16, 2024
06232ff
Merge branch 'main' into cds-1320
daidokoro Aug 22, 2024
3f57f36
Update processor/transformprocessor/internal/metrics/func_convert_exp…
daidokoro Sep 3, 2024
c19b23d
update function name and add warning message
daidokoro Sep 3, 2024
33b636e
fixed function name
daidokoro Sep 3, 2024
7b5447f
remove GOTO from distFn loop and changed transform function name
daidokoro Sep 3, 2024
d080b9e
adjust function name
daidokoro Sep 3, 2024
0220d7d
Merge branch 'main' into cds-1320
daidokoro Sep 3, 2024
456a464
fix typos
daidokoro Sep 3, 2024
d073e86
Merge branch 'cds-1320' of github.com:coralogix/opentelemetry-collect…
daidokoro Sep 3, 2024
3039e90
fix linting issues
daidokoro Sep 5, 2024
711a56f
Merge branch 'main' into cds-1320
daidokoro Sep 5, 2024
e027923
Merge branch 'main' into cds-1320
daidokoro Sep 13, 2024
71db112
go mod tidy on transformprocessor
daidokoro Sep 16, 2024
a34d136
Merge branch 'main' into cds-1320
daidokoro Sep 16, 2024
f8a29b6
Merge branch 'main' into cds-1320
daidokoro Sep 17, 2024
4c100e0
Merge branch 'main' into cds-1320
daidokoro Sep 17, 2024
f3a98d3
go mod tidy
daidokoro Sep 17, 2024
3fac936
fixed linting issue
daidokoro Sep 18, 2024
6e59f0a
Update processor/transformprocessor/internal/metrics/func_convert_exp…
daidokoro Sep 18, 2024
c1fa8ea
Update processor/transformprocessor/internal/metrics/func_convert_exp…
daidokoro Sep 18, 2024
5bb3c30
go fmt
daidokoro Sep 18, 2024
3543bb6
Merge branch 'main' into cds-1320
daidokoro Sep 20, 2024
614dcad
gci-ed
daidokoro Sep 20, 2024
a5f43c0
Merge branch 'main' into cds-1320
daidokoro Sep 20, 2024
5870ff4
make crosslink
daidokoro Sep 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/cds-1320.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: 'enhancement'

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: processor/transform

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add custom function to the transform processor to convert exponential histograms to explicit histograms."

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [33827]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
81 changes: 81 additions & 0 deletions processor/transformprocessor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,8 @@ In addition to OTTL functions, the processor defines its own functions to help w
- [copy_metric](#copy_metric)
- [scale_metric](#scale_metric)
- [aggregate_on_attributes](#aggregate_on_attributes)
- [convert_exponential_histogram_to_histogram](#convert_exponential_histogram_to_histogram)


### convert_sum_to_gauge

Expand Down Expand Up @@ -355,6 +357,84 @@ Examples:

- `copy_metric(desc="new desc") where description == "old desc"`


### convert_exponential_histogram_to_histogram

__Warning:__ The approach used in this function to convert exponential histograms to explicit histograms __is not__ part of the __OpenTelemetry Specification__.

`convert_exponential_histogram_to_histogram(distribution, [ExplicitBounds])`

The `convert_exponential_histogram_to_histogram` function converts an ExponentialHistogram to an Explicit (_normal_) Histogram.

This function requires 2 arguments:

- `distribution` - This argument defines the distribution algorithm used to allocate the exponential histogram datapoints into a new Explicit Histogram. There are 4 options:
<br>
- __upper__ - This approach identifies the highest possible value of each exponential bucket (_the upper bound_) and uses it to distribute the datapoints by comparing the upper bound of each bucket with the ExplicitBounds provided. This approach works better for small/narrow exponential histograms where the difference between the upper bounds and lower bounds are small.

_For example, Given:_
1. count = 10
2. Boundaries: [5, 10, 15, 20, 25]
3. Upper Bound: 15
_Process:_
4. Start with zeros: [0, 0, 0, 0, 0]
5. Iterate the boundaries and compare $upper = 15$ with each boundary:
- $15>5$ (_skip_)
- $15>10$ (_skip_)
- $15<=15$ (allocate count to this boundary)
6. Allocate count: [0, 0, __10__, 0, 0]
7. Final Counts: [0, 0, __10__, 0, 0]
<br>
- __midpoint__ - This approach works in a similar way to the __upper__ approach, but instead of using the upper bound, it uses the midpoint of each exponential bucket. The midpoint is identified by calculating the average of the upper and lower bounds. This approach also works better for small/narrow exponential histograms.
<br>

>The __uniform__ and __random__ distribution algorithms both utilise the concept of intersecting boundaries.
Intersecting boundaries are any boundary in the `boundaries array` that falls between or on the lower and upper values of the Exponential Histogram bounderies.
_For Example:_ if you have an Exponential Histogram bucket with a lower bound of 10 and upper of 20, and your boundaries array is [5, 10, 15, 20, 25], the intersecting boundaries are 10, 15, and 20 because they lie within the range [10, 20].
<br>
- __uniform__ - This approach distributes the datapoints for each bucket uniformly across the intersecting __ExplicitBounds__. The alogrithm works as follows:

- If there are valid intersecting boundaries, the function evenly distributes the count across these boundaries.
- Calculate the count to be allocated to each boundary.
- If there is a remainder after dividing the count equally, it distributes the remainder by incrementing the count for some of the boundaries until the remainder is exhausted.

_For example Given:_
1. count = 10
2. Exponential Histogram Bounds: [10, 20]
3. Boundaries: [5, 10, 15, 20, 25]
4. Intersecting Boundaries: [10, 15, 20]
5. Number of Intersecting Boundaries: 3
6. Using the formula: $count/numOfIntersections=10/3=3r1$
_Uniform Allocation:_
7. Start with zeros: [0, 0, 0, 0, 0]
8. Allocate 3 to each: [0, 3, 3, 3, 0]
9. Distribute remainder $r$ 1: [0, 4, 3, 3, 0]
10. Final Counts: [0, 4, 3, 3, 0]
<br>
- __random__ - This approach distributes the datapoints for each bucket randomly across the intersecting __ExplicitBounds__. This approach works in a similar manner to the uniform distribution algorithm with the main difference being that points are distributed randomly instead of uniformly. This works as follows:
- If there are valid intersecting boundaries, calculate the proportion of the count that should be allocated to each boundary based on the overlap of the boundary with the provided range (lower to upper).
- For each boundary, a random fraction of the calculated proportion is allocated.
- Any remaining count (_due to rounding or random distribution_) is then distributed randomly among the intersecting boundaries.
- If the bucket range does not intersect with any boundaries, the entire count is assigned to the start boundary.
<br>
daidokoro marked this conversation as resolved.
Show resolved Hide resolved
- `ExplicitBounds` represents the list of bucket boundaries for the new histogram. This argument is __required__ and __cannot be empty__.

__WARNINGS:__

- The process of converting an ExponentialHistogram to an Explicit Histogram is not perfect and may result in a loss of precision. It is important to define an appropriate set of bucket boundaries and identify the best distribution approach for your data in order to minimize this loss.

For example, selecting Boundaries that are too high or too low may result histogram buckets that are too wide or too narrow, respectively.

- __Negative Bucket Counts__ are not supported in Explicit Histograms, as such negative bucket counts are ignored.

- __ZeroCounts__ are only allocated if the ExplicitBounds array contains a zero boundary. That is, if the Explicit Boundaries that you provide does not start with `0`, the function will not allocate any zero counts from the Exponential Histogram.

This function should only be used when Exponential Histograms are not suitable for the downstream consumers or if upstream metric sources are unable to generate Explicit Histograms.

__Example__:

- `convert_exponential_histogram_to_histogram("random", [0.0, 10.0, 100.0, 1000.0, 10000.0])`

### scale_metric

`scale_metric(factor, Optional[unit])`
Expand Down Expand Up @@ -415,6 +495,7 @@ statements:

To aggregate only using a specified set of attributes, you can use `keep_matching_keys`.


## Examples

### Perform transformation if field does not exist
Expand Down
Loading
Loading