You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The windowseventlog receiver has a configuration parameter max_reads which determines the max number of events read from the event channel in a poll interval. In cases where the number of events being added to the channel in a poll interval is greater than max_reads the receiver can fall behind. In drastic situations the agent call fall behind severely, which was the case in #36472. In this situation, it's not clear the receiver is falling behind and that's why the newest events aren't being read from the channel.
I'm proposing adding some sort of mechanism for determining when the receiver is maxing out the number of events it can read from the channel. Maybe logging a debug log every time the number of events returned by evtNext() is equal to max_reads in this section of code. Or defining a monotonic cumulative sum metric that gets incremented every time this occurs, instead of a debug log.
Regardless of the mechanism, a way for the receiver to indicate it may be falling behind reading from an event log channel would go a long way in trouble shooting situations where it seems like the receiver is failing.
The text was updated successfully, but these errors were encountered:
logging a debug log every time the number of events returned by evtNext() is equal to max_reads
This seems like a simple way to warn about the issue. It might make sense to use a higher level such as info, since it may indicate a capacity problem. Warn is arguably appropriate but may be too high since the issue may resolve on its own.
monotonic cumulative sum metric that gets incremented every time this occurs, instead of a debug log
This also seems reasonable. Another option would be a histogram where each data point describes the number of events returned by one call to evtNext().
At a minimum, we should add the log, but either metric seems reasonable too. Curious what @pjanotti thinks
I think we need both the metric and the log, the former to let alerts to be created, the log to clarify the issue.
@dpaasman00 one thing that is not clear to me from #36472: it seems that the receiver stopped sending events from the channel, is that correct? My expectation would be that it was falling behind but kept sending events for the channel. Can you confirm what is the actual case?
Component(s)
pkg/stanza, receiver/windowseventlog
Describe the issue you're reporting
The
windowseventlog
receiver has a configuration parametermax_reads
which determines the max number of events read from the event channel in a poll interval. In cases where the number of events being added to the channel in a poll interval is greater thanmax_reads
the receiver can fall behind. In drastic situations the agent call fall behind severely, which was the case in #36472. In this situation, it's not clear the receiver is falling behind and that's why the newest events aren't being read from the channel.I'm proposing adding some sort of mechanism for determining when the receiver is maxing out the number of events it can read from the channel. Maybe logging a debug log every time the number of events returned by
evtNext()
is equal tomax_reads
in this section of code. Or defining a monotonic cumulative sum metric that gets incremented every time this occurs, instead of a debug log.Regardless of the mechanism, a way for the receiver to indicate it may be falling behind reading from an event log channel would go a long way in trouble shooting situations where it seems like the receiver is failing.
The text was updated successfully, but these errors were encountered: