Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packetbeat: add option to allow sniffer to change device when default route changes #32681

Merged
merged 4 commits into from
Sep 13, 2022

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Aug 15, 2022

What does this PR do?

Rather than attempting to append new sniffer device data to an established dump,
file, dump files are rolled when a new device is used. The alternative would
require keeping a track of link type and *pcapgo.Writer between calls to sniff
to ensure that the dump file is not truncated and that the header is
appropriately written.

If the user needs to have a single pcap file for the session, they can be merged
using either wireshark or mergecap.

Why is it important?

This makes packetbeat more useful on non-linux platforms that do not allow sniffing on the "any" device.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Make the following change and run packetbeat locally with packetbeat run -e.

diff --git a/packetbeat/packetbeat.yml b/packetbeat/packetbeat.yml
index ace50f29d3..18d30247ce 100644
--- a/packetbeat/packetbeat.yml
+++ b/packetbeat/packetbeat.yml
@@ -13,12 +13,13 @@
 # "any" keyword to sniff on all connected interfaces. On all platforms, you
 # can use "default_route", "default_route_ipv4" or "default_route_ipv6"
 # to sniff on the device carrying the default route.
-packetbeat.interfaces.device: any
+packetbeat.interfaces.device: default_route
 
 # Specify the amount of time between polling for changes in the default
 # route. This option is only used when one of the default route devices
 # is specified.
-packetbeat.interfaces.poll_default_route: 1m
+packetbeat.interfaces.poll_default_route: 20s
+packetbeat.interfaces.dumpfile: dump
 
 # The network CIDR blocks that are considered "internal" networks for
 # the purpose of network perimeter boundary classification. The valid

You should see a set of files in the form dump-<timestamp>.pcap. Opening these with wireshark should show you the network activity on the default route. Changing network interface should not stop capture, though there may be interruptions up to 20s long.

Related issues

Use cases

Screenshots

Logs

@efd6 efd6 self-assigned this Aug 15, 2022
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 15, 2022
@efd6 efd6 force-pushed the network_traffic_follow_default branch from 42b301f to 767ac40 Compare August 15, 2022 01:30
@mergify
Copy link
Contributor

mergify bot commented Aug 15, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b network_traffic_follow_default upstream/network_traffic_follow_default
git merge upstream/main
git push upstream network_traffic_follow_default

@efd6 efd6 force-pushed the network_traffic_follow_default branch from 767ac40 to 3c5e822 Compare August 15, 2022 01:38
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 15, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-25T06:14:59.656+0000

  • Duration: 47 min 13 sec

Test stats 🧪

Test Results
Failed 0
Passed 1673
Skipped 19
Total 1692

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@efd6 efd6 force-pushed the network_traffic_follow_default branch from 3c5e822 to c04f430 Compare August 15, 2022 02:15
@efd6 efd6 marked this pull request as ready for review August 15, 2022 03:47
@efd6 efd6 requested a review from a team as a code owner August 15, 2022 03:47
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@efd6 efd6 force-pushed the network_traffic_follow_default branch from c04f430 to fdaff9b Compare August 17, 2022 00:47
Comment on lines +257 to +278
const timeSuffixFormat = "20060102150405"
filename := fmt.Sprintf("%s-%s.pcap", s.config.Dumpfile, time.Now().Format(timeSuffixFormat))
logp.Info("creating new dump file %s", filename)
f, err := os.Create(filename)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in file clobbering if more than one sniffer is run by the config manager. This is not worse than what we currently have, but worth being aware of.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing by switching over to a Wireguard VPN on my macbook. It switches the default route over to a utun devices. The poller did not observe the change. I was expected this to result in a failure of some kind since Packetbeat cannot sniff "raw" devices.

# No VPN.
% route -n get default
   route to: default
destination: default
       mask: default
    gateway: 10.100.11.1
  interface: en0
      flags: <UP,GATEWAY,DONE,STATIC,PRCLONING,GLOBAL>
 recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
       0         0         0         0         0         0      1500         0
# Wireguard enabled.
% route -n get default                    
   route to: default
destination: default
       mask: default
  interface: utun5
      flags: <UP,DONE,CLONING,STATIC,GLOBAL>
 recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
       0         0         0         0         0         0      1420         0 

packetbeat/sniffer/sniffer.go Outdated Show resolved Hide resolved
var err error
last, worker, err = s.sniffOneDynamic(device, last, worker)
if err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about an edge case that we probably want to handle. If the default route moves to an unsupported link type temporarily, do we want Packetbeat to continue watching for changes even if it cannot sniff the current interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that should have a timeout or just continue forever? I'm leaning towards forever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm thinking forever too. If I leave home and turn on my VPN, which cannot be monitored due to it being a raw device, I still want packetbeat to start monitoring when the VPN goes off.

side note: I think it would be useful to have a monitoring metric that reports the current interface name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work for the default route case, but when there are multiple interfaces this won't work AFAICS. So I think the best thing to do is to publish it as the default route.

@andrewkroh
Copy link
Member

I tested two other scenarios:

  1. Transitioning from wifi to Ethernet dongle on macos. ✅
  2. Transitioning from Ethernet dongle back to wifi by disconnecting the USB dongle. Packetbeat exited. 🛑 Maybe this is something we should consider handling.
{"log.level":"error","@timestamp":"2022-08-23T00:20:41.957-0400","log.origin":{"file.name":"instance/beat.go","file.line":1051},"message":"Exiting: sniffer loop failed: sniffing error: Read Error","service.name":"packetbeat","ecs.version":"1.6.0"}
Exiting: sniffer loop failed: sniffing error: Read Error

@efd6
Copy link
Contributor Author

efd6 commented Aug 23, 2022

ISTM that if we never fail out due to an error in reading the network, we cover both of #32681 (comment) and #32681 (comment). There is no reason to fail out for a read error (or probably any gopacket error) AFAICS.

@mergify
Copy link
Contributor

mergify bot commented Aug 23, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b network_traffic_follow_default upstream/network_traffic_follow_default
git merge upstream/main
git push upstream network_traffic_follow_default

@efd6 efd6 force-pushed the network_traffic_follow_default branch from f3a656c to df66ff6 Compare August 23, 2022 06:52
Comment on lines +335 to 348
// If we are following a default route, request an interface
// refresh and log the error.
if s.followDefault {
select {
case refresh <- struct{}{}:
default:
// Don't request to refresh if already requested.
}
logp.Warn("error during packet capture: %v", err)
continue
}

s.state.Store(snifferInactive)
return fmt.Errorf("sniffing error: %w", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises the question of how we treat errors on non-default route interfaces. I think it's arguable that they should also just keep trying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I can make arguments for it both ways.

If the interfaces goes way temporarily for some reason then I still want Packetbeat to resume when it comes back. I could let systemd/launchd/agent handle restarting the process continuously.

On the other hand, it is useful to have hard failure when testing out my config. I can quickly catch that I typo'ed the interface name if Packetbeat exits immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so an option would be to keep a state of whether there was ever a connection. If there was, keep trying, but if there wasn't, fail out. I would not want to add this in this change, but it is an option to consider in the future.

@mergify
Copy link
Contributor

mergify bot commented Aug 24, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b network_traffic_follow_default upstream/network_traffic_follow_default
git merge upstream/main
git push upstream network_traffic_follow_default

@efd6 efd6 force-pushed the network_traffic_follow_default branch from 1b884dc to 469ff29 Compare August 24, 2022 22:29
efd6 added 3 commits August 25, 2022 08:01
… route changes

Rather than attempting to append new sniffer device data to an established dump,
file, dump files are rolled when a new device is used. The alternative would
require keeping a track of link type and *pcapgo.Writer between calls to sniff
to ensure that the dump file is not truncated and that the header is
appropriately written.

If the user needs to have a single pcap file for the session, they can be merged
using either wireshark or mergecap.
Use this when there have been too many timeouts, or when there is an error
condition during packet capture.
@efd6 efd6 force-pushed the network_traffic_follow_default branch from 469ff29 to 41403c7 Compare August 24, 2022 22:32
@efd6
Copy link
Contributor Author

efd6 commented Aug 24, 2022

@efd6 efd6 requested a review from andrewkroh September 7, 2022 08:06
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The error handling in the sniffer w.r.t. unhandled interface types is something we should improve upon separately.

@efd6
Copy link
Contributor Author

efd6 commented Sep 13, 2022

The error handling in the sniffer w.r.t. unhandled interface types is something we should improve upon separately.

Absolutely.

@efd6 efd6 merged commit 9f4bb06 into elastic:main Sep 13, 2022
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
… route changes (#32681)

* packetbeat: add option to allow sniffer to change device when default route changes

Rather than attempting to append new sniffer device data to an established dump,
file, dump files are rolled when a new device is used. The alternative would
require keeping a track of link type and *pcapgo.Writer between calls to sniff
to ensure that the dump file is not truncated and that the header is
appropriately written.

If the user needs to have a single pcap file for the session, they can be merged
using either wireshark or mergecap.

* packetbeat/sniffer: add logic to allow requests for interface refresh

Use this when there have been too many timeouts, or when there is an error
condition during packet capture.

* register a default_route metric when following default route
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.5 candidate backport-skip Skip notification from the automated backport with mergify enhancement Packetbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants