Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Add parse_aws_vpc_flow_log processor #33656

Merged

Conversation

andrewkroh
Copy link
Member

@andrewkroh andrewkroh commented Nov 12, 2022

What does this PR do?

This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both.

Usage:

processors:
  - parse_aws_vpc_flow_log: 
      format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
  - community_id: ~

Benchmark:

goos: darwin
goarch: arm64
pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow
BenchmarkProcessorRun/original-mode-v5-message-10                2810948              2138 ns/op            2836 B/op         31 allocs/op
BenchmarkProcessorRun/ecs-mode-v5-message-10                     1914754              3107 ns/op            1908 B/op         41 allocs/op
BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10        1693279              3538 ns/op            3076 B/op         41 allocs/op

Why is it important?

The normal volume of flow logs makes processing them a hot path. So provide a Beat processor to try to make processing as efficient as it can be and make it possible to use the Beat host's CPU to do processing.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Authors Notes

  • Need to add static fields like event.kind, event.type, cloud.provider. etc. This can be added by other processors.
  • Need to compare output with Fleet package.
    • NOTE: ecs mode in the processor is more aggressive in reducing duplication. The Fleet integration might want to use ecs_and_original then drop the fields needed to maintain its existing behavior.

This is a processor for parsing AWS VPC flow logs. It requires a user
specified log format. It can populate the original flow log fields,
ECS fields, or both.

Usage:

processors:
  - parse_aws_vpc_flow_log:
      format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status

Benchmark:

goos: darwin
goarch: arm64 (Apple M1 Max)
pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow
BenchmarkProcessorRun/v5-mode-original-10                2694968              2212 ns/op            2836 B/op         31 allocs/op
BenchmarkProcessorRun/v5-mode-ecs_and_original-10        1812913              3318 ns/op            2972 B/op         36 allocs/op
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 12, 2022
@mergify

This comment was marked as outdated.

@andrewkroh andrewkroh force-pushed the feature/fb/parse-aws-vpc-flow-log-processor branch from d149e21 to 6bcd888 Compare November 12, 2022 07:31
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 12, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-11-16T16:09:24.065+0000

  • Duration: 104 min 3 sec

Test stats 🧪

Test Results
Failed 0
Passed 24043
Skipped 1951
Total 25994

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@andrewkroh andrewkroh marked this pull request as ready for review November 13, 2022 20:21
@andrewkroh andrewkroh requested review from a team as code owners November 13, 2022 20:21
@andrewkroh andrewkroh requested review from fearful-symmetry and leehinman and removed request for a team November 13, 2022 20:21
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 13, 2022
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

One thing I noticed, in existing implementations you can have a mix of of vpc log formats, as long as they have a different number of fields. Could we change this so it can handle multiple formats? The use case I see is someone modifies their config for recording vpcflow logs, which would result in different formats being in the same S3 bucket.

@mergify

This comment was marked as outdated.

@andrewkroh
Copy link
Member Author

andrewkroh commented Nov 16, 2022

One thing I noticed, in existing implementations you can have a mix of of vpc log formats, as long as they have a different number of fields. Could we change this so it can handle multiple formats? The use case I see is someone modifies their config for recording vpcflow logs, which would result in different formats being in the same S3 bucket.

I think this was an accidental feature. IMO we wanted to give the users the flexibility of an arbitrary format, but didn't have a way to deliver it due to limitations in ingest pipelines. So instead we loaded the pipeline with a few prescribed formats to get close to the feature.

I can implement this with the requirement that each format have a unique field count. The execution time cost increases by about 4.5% 1.5% when running a config with a single pattern. This would allow the Fleet integration to adopt the feature without a breaking change.

Comparison between 3a6e4cd .. 70bde35

name                                              old time/op    new time/op    delta
ProcessorRun/original-mode-v5-message-10            2.21µs ± 0%    2.14µs ± 0%   ~     (p=1.000 n=1+1)
ProcessorRun/ecs-mode-v5-message-10                 2.80µs ± 0%    2.82µs ± 0%   ~     (p=1.000 n=1+1)
ProcessorRun/ecs_and_original-mode-v5-message-10    3.21µs ± 0%    3.23µs ± 0%   ~     (p=1.000 n=1+1)

@andrewkroh andrewkroh requested a review from leehinman November 16, 2022 03:01
This gives a speedup and reduces the cost of adding multiple format support.

benchmark                                                     old ns/op     new ns/op     delta
BenchmarkProcessorRun/original-mode-v5-message-10             2225          2136          -4.00%
BenchmarkProcessorRun/ecs-mode-v5-message-10                  2875          2817          -2.02%
BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10     3352          3233          -3.55%
@andrewkroh andrewkroh added backport-v8.6.0 Automated backport with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Nov 16, 2022
@andrewkroh andrewkroh force-pushed the feature/fb/parse-aws-vpc-flow-log-processor branch from c123a55 to 024b82b Compare November 16, 2022 14:17
Event type is a list. It will always contain "connection". Add "allowed" or "denied" will be
added based on the vpc flow action of "ACCEPT" or "REJECT".
@andrewkroh
Copy link
Member Author

andrewkroh commented Nov 16, 2022

@Mergifyio backport 8.6

mergify bot pushed a commit that referenced this pull request Nov 16, 2022
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both.

Usage:

```yaml
processors:
  - parse_aws_vpc_flow_log:
      format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
  - community_id: ~
```

Benchmark:

```
goos: darwin
goarch: arm64
pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow
BenchmarkProcessorRun/original-mode-v5-message-10                2810948              2138 ns/op            2836 B/op         31 allocs/op
BenchmarkProcessorRun/ecs-mode-v5-message-10                     1914754              3107 ns/op            1908 B/op         41 allocs/op
BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10        1693279              3538 ns/op            3076 B/op         41 allocs/op
```

Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
(cherry picked from commit 1a86e42)
@mergify
Copy link
Contributor

mergify bot commented Nov 16, 2022

backport 8.6

✅ Backports have been created

andrewkroh added a commit that referenced this pull request Nov 16, 2022
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both.

Usage:

```yaml
processors:
  - parse_aws_vpc_flow_log:
      format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
  - community_id: ~
```

Benchmark:

```
goos: darwin
goarch: arm64
pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow
BenchmarkProcessorRun/original-mode-v5-message-10                2810948              2138 ns/op            2836 B/op         31 allocs/op
BenchmarkProcessorRun/ecs-mode-v5-message-10                     1914754              3107 ns/op            1908 B/op         41 allocs/op
BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10        1693279              3538 ns/op            3076 B/op         41 allocs/op
```

Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
(cherry picked from commit 1a86e42)

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both.

Usage:

```yaml
processors:
  - parse_aws_vpc_flow_log: 
      format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
  - community_id: ~
```

Benchmark:

```
goos: darwin
goarch: arm64
pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow
BenchmarkProcessorRun/original-mode-v5-message-10                2810948              2138 ns/op            2836 B/op         31 allocs/op
BenchmarkProcessorRun/ecs-mode-v5-message-10                     1914754              3107 ns/op            1908 B/op         41 allocs/op
BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10        1693279              3538 ns/op            3076 B/op         41 allocs/op
```

Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.6.0 Automated backport with mergify enhancement Filebeat Filebeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants