[pkg/stanza] Windows Input Operator falls behind reading from channel #36491

dpaasman00 · 2024-11-21T18:39:23Z

Component(s)

pkg/stanza, receiver/windowseventlog

Describe the issue you're reporting

The windowseventlog receiver has a configuration parameter max_reads which determines the max number of events read from the event channel in a poll interval. In cases where the number of events being added to the channel in a poll interval is greater than max_reads the receiver can fall behind. In drastic situations the agent call fall behind severely, which was the case in #36472. In this situation, it's not clear the receiver is falling behind and that's why the newest events aren't being read from the channel.

I'm proposing adding some sort of mechanism for determining when the receiver is maxing out the number of events it can read from the channel. Maybe logging a debug log every time the number of events returned by evtNext() is equal to max_reads in this section of code. Or defining a monotonic cumulative sum metric that gets incremented every time this occurs, instead of a debug log.

Regardless of the mechanism, a way for the receiver to indicate it may be falling behind reading from an event log channel would go a long way in trouble shooting situations where it seems like the receiver is failing.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-11-21T18:39:42Z

Pinging code owners:

pkg/stanza: @djaglowski
receiver/windowseventlog: @armstrmi @pjanotti

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2024-12-03T14:31:24Z

logging a debug log every time the number of events returned by evtNext() is equal to max_reads

This seems like a simple way to warn about the issue. It might make sense to use a higher level such as info, since it may indicate a capacity problem. Warn is arguably appropriate but may be too high since the issue may resolve on its own.

monotonic cumulative sum metric that gets incremented every time this occurs, instead of a debug log

This also seems reasonable. Another option would be a histogram where each data point describes the number of events returned by one call to evtNext().

At a minimum, we should add the log, but either metric seems reasonable too. Curious what @pjanotti thinks

pjanotti · 2024-12-04T18:55:53Z

I think we need both the metric and the log, the former to let alerts to be created, the log to clarify the issue.

@dpaasman00 one thing that is not clear to me from #36472: it seems that the receiver stopped sending events from the channel, is that correct? My expectation would be that it was falling behind but kept sending events for the channel. Can you confirm what is the actual case?

dpaasman00 added the needs triage New item requiring triage label Nov 21, 2024

github-actions bot added pkg/stanza receiver/windowseventlog labels Nov 21, 2024

djaglowski added collector-observability and removed needs triage New item requiring triage labels Nov 22, 2024

github-actions bot mentioned this issue Nov 26, 2024

Weekly Report: 2024-11-19 - 2024-11-26 #36533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/stanza] Windows Input Operator falls behind reading from channel #36491

[pkg/stanza] Windows Input Operator falls behind reading from channel #36491

dpaasman00 commented Nov 21, 2024

github-actions bot commented Nov 21, 2024

djaglowski commented Dec 3, 2024

pjanotti commented Dec 4, 2024