You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm seeing a about 10 log lines per hour like failed to find dimensions for key when using servicegraph. Looking around the code a bit, I think I found an issue, that might be the cause:
The cleanCache function cleans up series that haven't been used in 15mins. It does this by:
Deleting it from p.keyToMetric (which holds the dimensions for the metric), then
Deleting it from all the metric series maps, e.g. p.reqTotal
In parallel, the metrics are collected by:
Looping through all items in the metric series maps, e.g. p.reqTotal
For each, gets the metric's dimensions from p.keyToMetric
Because these occur in opposite orders, we can get into a sticky situation where the collector function errors out when getting the metric's dimensions.
I can see a few ways around this:
Reverse the order of the operations in the cleanup script
make the collection functions skip series where their dimensions have already been cleaned up
Steps to Reproduce
Kinda tricky, since it's a race condition...
Expected Result
No errors logged, all metrics collected and exported.
Actual Result
Errors logged like:
failed to build metrics: failed to find dimensions for key ...
and presumably some metrics not being exported (since the collection function will return when it encounters this error, skipping subsequent metric series).
Component(s)
connector/servicegraph
What happened?
Description
I'm seeing a about 10 log lines per hour like
failed to find dimensions for key
when using servicegraph. Looking around the code a bit, I think I found an issue, that might be the cause:The
cleanCache
function cleans up series that haven't been used in 15mins. It does this by:In parallel, the metrics are collected by:
Because these occur in opposite orders, we can get into a sticky situation where the collector function errors out when getting the metric's dimensions.
I can see a few ways around this:
Steps to Reproduce
Kinda tricky, since it's a race condition...
Expected Result
No errors logged, all metrics collected and exported.
Actual Result
Errors logged like:
and presumably some metrics not being exported (since the collection function will
return
when it encounters this error, skipping subsequent metric series).Collector version
v0.96.0
Environment information
Environment
Using the opentelemetry-collector-contrib image.
OpenTelemetry Collector configuration
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: