Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #22827 to 7.10: [Auditbeat] system/socket: Monitor all online CPUs #22873

Merged
merged 2 commits into from
Dec 3, 2020

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Dec 2, 2020

Cherry-pick of PR #22827 to 7.10 branch. Original message:

What does this PR do?

This patch updates the tracing library in Auditbeat to fetch the list of online CPUs from /sys/devices/system/cpu/online so that it can install kprobes in all of them regardless of its own affinity mask, and correctly skipping offline CPUs.

Why is it important?

Auditbeat's system/socket dataset needs to install kprobes on all online CPUs.

Previously, it was using Go's runtime.NumCPU() to determine the CPUs in the system, and monitoring CPUs 0 to NumCPU-1. This was a mistake that lead to startup failures or loss of events in any of the following scenarios:

  • When Auditbeat is started with a CPU affinity mask that excludes some CPUs.
  • When there are offline or isolated CPUs in the system.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Easier way to reproduce is to start Auditbeat with a CPU affinity mask that excludes the first CPU and only allows it to run on the second CPU:

sudo taskset 2 auditbeat [...]

This will pin Auditbeat to CPU1 while kprobes will be installed to CPU0, preventing guesses to work.

Alternatively, one can disable a few CPUs before launching Auditbeat:

# echo 0 > /sys/devices/system/cpu/cpu0/online

Related issues

Related #18755

This PR fixes most of the problems reported in the above issue, but the main issue is fixed by #22787

Auditbeat's system/socket dataset needs to install kprobes on all
online CPUs.

Previously, it was using runtime.NumCPU() to determine the CPUs in the
system, and monitoring CPUs 0 to NumCPU. This was a mistake that lead
to startup failures or loss of events in any of the following scenarios:
- When Auditbeat is started with a CPU affinity mask that excludes some CPUs
- When there are offline or isolated CPUs in the system.

This patch updates the tracing library in Auditbeat to fetch the list of
online CPUs from /sys/devices/system/cpu/online so that it can install
kprobes in all of them regardless of its own affinity mask, and correctly
skipping offline CPUs.

Related elastic#18755

(cherry picked from commit 6356887)
@adriansr adriansr requested a review from a team as a code owner December 2, 2020 19:48
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 2, 2020
@adriansr adriansr requested review from a team and removed request for a team December 2, 2020 19:48
@adriansr adriansr added the review label Dec 2, 2020
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #22873 opened

  • Start Time: 2020-12-02T19:48:45.217+0000

  • Duration: 27 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 232
Skipped 33
Total 265

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 232
Skipped 33
Total 265

@adriansr adriansr merged commit 9cc8c67 into elastic:7.10 Dec 3, 2020
@zube zube bot removed the [zube]: Done label Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants