-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
winlogbeat/eventlog: add initial event processing metrics support #33922
Conversation
736b923
to
18d7626
Compare
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
This adds metrics for discarded_events_total, errors_total, received_events_count, received_events_total and source_lag_time, and also a histogram for non-zero event batch period intended to make burstiness of log event creation visible. The metrics are visible for all winlogbeat/eventlog client packages, e.g. the filebeat winlog input. source_lag_count is left for a later PR.
3e062fe
to
2580c12
Compare
This only operates on Windows hosts and is best effort only.
2580c12
to
6a9e46f
Compare
winlogbeat/eventlog/poll_windows.go
Outdated
} | ||
defer windows.CloseHandle(event) | ||
|
||
s, err := win.Subscribe(0, event, "", "", 0, win.EvtSubscribeStartAtOldestRecord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not be doing what I would expect. This should be attempting to get the newest event's record ID to compare with reader's current event ID. I'm not sure if this can be done exclusively through flags or if an EvtSeek
is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I was off-by-sign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right now; the docs are weak, but they say that the EvtSeek
d handle should have come from a call to EvtQuery
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not right.
1b6ca5b
to
3e3f63c
Compare
3e3f63c
to
e790fbe
Compare
if id == "" { | ||
return nil | ||
} | ||
reg, unreg := inputmon.NewInputRegistry(name, id, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the input name be a constant winlog
. This way when viewing this in agent we have input
values like winlog
, udp
, cel
, lumberjack
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should happen in the filebeat end, since this will also be being called from winlogbeat.
winlogbeat/eventlog/eventlog.go
Outdated
Register("histogram", metrics.NewHistogram(out.batchPeriod)) | ||
|
||
if poll > 0 && runtime.GOOS == "windows" { | ||
out.lastRecordID = &atomic.Uint64{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out.sourceLagN
is never initialized which will cause a panic when something tries to take a snapshot of the metrics. You can see this if you start Winlogbeat with logging.metrics.namespaces: [stats, dataset]
and wait 30s.
And I'm thinking that since this metric updates every 1 minute that we need a smaller reservoir size like say 15 so we are looking at samples over the last 15 minutes.
out.lastRecordID = &atomic.Uint64{} | |
out.lastRecordID = &atomic.Uint64{} | |
out.sourceLagN = metrics.NewUniformSample(15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
winlogbeat/eventlog/eventlog.go
Outdated
case <-t.C: | ||
last, err := lastEvent(name, work[:], buf) | ||
if err != nil { | ||
m.logError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing I saw the errors_total
going up and traced it to here with and error string of The handle is invalid.
I never saw source_lag_count get updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is currently not correct, and I don't see from the MS documentation how to make it correct.
This reverts commit 6a9e46f. Reason for revert: I have not been able to get this particular functionality to work, so we'll leave it out for now and come back to it later.
Kudos, SonarCloud Quality Gate passed! |
…3922) This adds metrics for discarded_events_total, errors_total, received_events_count, received_events_total and source_lag_time, and also a histogram for non-zero event batch period intended to make burstiness of log event creation visible. The metrics are visible for all winlogbeat/eventlog client packages, e.g. the filebeat winlog input. source_lag_count is left for a later PR.
What does this PR do?
This adds metrics for discarded_events_total, errors_total, received_events_count,
received_events_total and source_lag_time, and also a histogram for non-zero event
batch period intended to make burstiness of log event creation visible.
The metrics are visible for all winlogbeat/eventlog client packages, e.g. the
filebeat winlog input.
source_lag_count is left for a later PR.source_lag_count is in a separate commit since it was noted as being potentially not worth the trouble. The code is not too complex and with a reasonably infrequent polling should not be a significant host load (initially set to once per minute).Why is it important?
This allows us to help users configure their systems to match the requirements that they have.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs
Attaching sample output from the /dataset endpoint: dataset.json.txt