Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

winlogbeat/eventlog: add initial event processing metrics support #33922

Merged
merged 8 commits into from
Jan 10, 2023

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Dec 2, 2022

What does this PR do?

This adds metrics for discarded_events_total, errors_total, received_events_count,
received_events_total and source_lag_time, and also a histogram for non-zero event
batch period intended to make burstiness of log event creation visible.

The metrics are visible for all winlogbeat/eventlog client packages, e.g. the
filebeat winlog input.

source_lag_count is left for a later PR. source_lag_count is in a separate commit since it was noted as being potentially not worth the trouble. The code is not too complex and with a reasonably infrequent polling should not be a significant host load (initially set to once per minute).

Why is it important?

This allows us to help users configure their systems to match the requirements that they have.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Recommend review by commit since the first commit is colour and movement.

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Attaching sample output from the /dataset endpoint: dataset.json.txt

@efd6 efd6 added enhancement Filebeat Filebeat Winlogbeat Team:Security-External Integrations backport-skip Skip notification from the automated backport with mergify 8.7-candidate labels Dec 2, 2022
@efd6 efd6 self-assigned this Dec 2, 2022
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 2, 2022
@efd6 efd6 force-pushed the st5527-winlogbeat branch 2 times, most recently from 736b923 to 18d7626 Compare December 2, 2022 00:49
@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 2, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-01-06T05:31:35.479+0000

  • Duration: 40 min 53 sec

Test stats 🧪

Test Results
Failed 0
Passed 930
Skipped 9
Total 939

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@efd6 efd6 marked this pull request as ready for review December 2, 2022 01:32
@efd6 efd6 requested a review from a team as a code owner December 2, 2022 01:32
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

This adds metrics for discarded_events_total, errors_total, received_events_count,
received_events_total and source_lag_time, and also a histogram for non-zero event
batch period intended to make burstiness of log event creation visible.

The metrics are visible for all winlogbeat/eventlog client packages, e.g. the
filebeat winlog input.

source_lag_count is left for a later PR.
@efd6 efd6 force-pushed the st5527-winlogbeat branch 3 times, most recently from 3e062fe to 2580c12 Compare December 4, 2022 23:54
This only operates on Windows hosts and is best effort only.
}
defer windows.CloseHandle(event)

s, err := win.Subscribe(0, event, "", "", 0, win.EvtSubscribeStartAtOldestRecord)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might not be doing what I would expect. This should be attempting to get the newest event's record ID to compare with reader's current event ID. I'm not sure if this can be done exclusively through flags or if an EvtSeek is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I was off-by-sign.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right now; the docs are weak, but they say that the EvtSeekd handle should have come from a call to EvtQuery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not right.

@efd6 efd6 force-pushed the st5527-winlogbeat branch from 1b6ca5b to 3e3f63c Compare December 6, 2022 06:52
@efd6 efd6 force-pushed the st5527-winlogbeat branch from 3e3f63c to e790fbe Compare December 6, 2022 10:26
if id == "" {
return nil
}
reg, unreg := inputmon.NewInputRegistry(name, id, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the input name be a constant winlog. This way when viewing this in agent we have input values like winlog, udp, cel, lumberjack, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should happen in the filebeat end, since this will also be being called from winlogbeat.

Register("histogram", metrics.NewHistogram(out.batchPeriod))

if poll > 0 && runtime.GOOS == "windows" {
out.lastRecordID = &atomic.Uint64{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out.sourceLagN is never initialized which will cause a panic when something tries to take a snapshot of the metrics. You can see this if you start Winlogbeat with logging.metrics.namespaces: [stats, dataset] and wait 30s.

And I'm thinking that since this metric updates every 1 minute that we need a smaller reservoir size like say 15 so we are looking at samples over the last 15 minutes.

Suggested change
out.lastRecordID = &atomic.Uint64{}
out.lastRecordID = &atomic.Uint64{}
out.sourceLagN = metrics.NewUniformSample(15)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

case <-t.C:
last, err := lastEvent(name, work[:], buf)
if err != nil {
m.logError(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing I saw the errors_total going up and traced it to here with and error string of The handle is invalid.

I never saw source_lag_count get updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is currently not correct, and I don't see from the MS documentation how to make it correct.

efd6 added 2 commits December 7, 2022 07:25
This reverts commit 6a9e46f.

Reason for revert: I have not been able to get this particular functionality to
work, so we'll leave it out for now and come back to it later.
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 8, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@efd6 efd6 requested a review from andrewkroh December 8, 2022 19:35
@efd6 efd6 merged commit 64d8649 into elastic:main Jan 10, 2023
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
…3922)

This adds metrics for discarded_events_total, errors_total, received_events_count,
received_events_total and source_lag_time, and also a histogram for non-zero event
batch period intended to make burstiness of log event creation visible.

The metrics are visible for all winlogbeat/eventlog client packages, e.g. the
filebeat winlog input.

source_lag_count is left for a later PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.7-candidate backport-skip Skip notification from the automated backport with mergify enhancement Filebeat Filebeat Winlogbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants