Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Beats to increase publisher internal queue size #22650

Merged
merged 8 commits into from
Dec 3, 2020

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Nov 18, 2020

What does this PR do?

This is a mitigation to fix a problem with Beats that use PublishMode=DropIfFull (only Packetbeat).

This parameter is meant to drop events when the configured queue is full. However, due to the way the mem and spool queues are implemented, this is actually dropping events when the internal "publisher" queue is full. Due to the small hardcoded size in this queue (20 events), it can get full easily when the publisher queue is not full, just because a burst of events is published and the consumer goroutine doesn't react fast enough to consume them.

Changes in this PR allows Beats to set a parameter in instance.Settings to request a larger internal queue and Packetbeat is updated to request a queue for up to 400 events. All other Beats remain unchanged.

Why is it important?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 18, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 18, 2020
@adriansr adriansr requested review from faec and urso November 18, 2020 13:07
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 18, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #22650 updated

  • Start Time: 2020-12-03T10:37:47.139+0000

  • Duration: 55 min 6 sec

Test stats 🧪

Test Results
Failed 0
Passed 16895
Skipped 1373
Total 18268

Steps errors 2

Expand to view the steps failures

Terraform Apply on x-pack/metricbeat/module/aws

  • Took 0 min 15 sec . View more details on here

Terraform Apply on x-pack/metricbeat/module/aws

  • Took 0 min 15 sec . View more details on here

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 16895
Skipped 1373
Total 18268

@adriansr adriansr marked this pull request as ready for review November 19, 2020 17:00
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@@ -122,3 +127,15 @@ type Batch interface {
Events() []publisher.Event
ACK()
}

// AdjustInternalQueueSize decides the size for the internal queue used by most queue implementations.
func AdjustInternalQueueSize(requested, mainQueueSize int) (actual int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, sorry for the shuffle but I think I'd prefer if this was back within memqueue... this calculation is nearly a noop for the spool, which behaves very differently (and will be going away in the future anyway), so the real use of this helper is just to cap the intake size for the memory queue to something reasonable.

In an ideal world I'd actually like the flag itself to be in memqueue.Settings (like the analogous WriteAheadLimit flag is on the disk queue), but right now there's no clean way to do that without exposing the setting to the user, which seems undesirable. I don't want to gate the change on fixing that plumbing, so it makes sense to me to put it on the Beat config, but I'd still prefer to keep the scope narrow if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to memque. Also renamed all references to "InternalQueueSize" to "InputQueueSize", which I think it better defines it, "internal queue" being what we call the main event queue (mem, spool...) in our docs.

Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed when this was updated again -- looks good, and thanks!

@adriansr adriansr merged commit 1657b5a into elastic:master Dec 3, 2020
adriansr added a commit to adriansr/beats that referenced this pull request Dec 3, 2020
This is a mitigation to fix a problem with Beats that use
PublishMode=DropIfFull (only Packetbeat).

This parameter is meant to drop events when the configured queue is
full. However, due to the way the mem and spool queues are implemented,
this is actually dropping events when the internal "publisher" queue is
full. Due to the small hardcoded size in this queue (20 events), it can
get full easily when the publisher queue is not full, just because a
burst of events is published and the consumer goroutine doesn't react
fast enough to consume them.

Changes in this PR allows Beats to set a parameter in instance.Settings
to request a larger internal queue and Packetbeat is updated to request
a queue for up to 400 events. All other Beats remain unchanged.

(cherry picked from commit 1657b5a)
adriansr added a commit that referenced this pull request Dec 3, 2020
This is a mitigation to fix a problem with Beats that use
PublishMode=DropIfFull (only Packetbeat).

This parameter is meant to drop events when the configured queue is
full. However, due to the way the mem and spool queues are implemented,
this is actually dropping events when the internal "publisher" queue is
full. Due to the small hardcoded size in this queue (20 events), it can
get full easily when the publisher queue is not full, just because a
burst of events is published and the consumer goroutine doesn't react
fast enough to consume them.

Changes in this PR allows Beats to set a parameter in instance.Settings
to request a larger internal queue and Packetbeat is updated to request
a queue for up to 400 events. All other Beats remain unchanged.

(cherry picked from commit 1657b5a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants