Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add awscloudwatch filebeat input #19025

Merged
merged 27 commits into from
Jul 1, 2020
Merged

Add awscloudwatch filebeat input #19025

merged 27 commits into from
Jul 1, 2020

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Jun 6, 2020

What does this PR do?

This PR is to add awscloudwatch input into Filebeat. FilterLogEvents AWS API is used to get all log events from a given log group.

The config for using awscloudwatch input looks like below:

filebeat.inputs:
  - type: awscloudwatch
    credential_profile_name: elastic-beats
    log_group_arn: arn:aws:logs:us-east-1:428152502467:log-group:test:*
    region: us-east-1
    scan_frequency: 30s
    start_position: beginning
    api_timeout: 5m

With this config, Filebeat will enable awscloudwatch input to collect all logs from log group test in region us-east-1 (this info is parsed from the log group ARN) starting from the beginning of the log group and then check for new log events every 1-minute(defined by the scan_frequency).

If users only wants to collect new log events/messages from now going forward, then start_position can be specified to be end.

User can also specify a list of log streams under the log group to collect logs from or a log_stream_prefix to collect events only from log streams that have names starting with this prefix.

Sample output document:

{
  "_index": "filebeat-8.0.0-2020.06.26-000001",
  "_type": "_doc",
  "_id": "35535245334778869563151408522177877742806142229743009792",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2020-06-29T18:31:50.000Z",
    "input": {
      "type": "awscloudwatch"
    },
    "ecs": {
      "version": "1.5.0"
    },
    "event": {
      "ingested": "2020-06-29T18:33:21.207Z",
      "id": "35535245334778869563151408522177877742806142229743009792"
    },
    "message": "Apr 23 17:40:01 ip-172-31-81-156 systemd: Removed slice User Slice of root.",
    "log.file.path": "test/test1",
    "agent": {
      "version": "8.0.0",
      "ephemeral_id": "8b7ddd05-a81f-44c3-a0b2-c9a78396b97e",
      "id": "7578d49c-6588-4843-85cc-ad3859f99ed1",
      "name": "KaiyanMacBookPro",
      "type": "filebeat"
    },
    "awscloudwatch": {
      "log_group": "test",
      "log_stream": "test1",
      "ingestion_time": "2020-06-29T18:31:51.000Z"
    },
    "cloud": {
      "provider": "aws",
      "region": "us-east-1"
    }
}

Why is it important?

This input allows user to collect logs from CloudWatch without sending them into S3 bucket with SQS setup for notification.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Go to AWS CloudWatch portal and click Logs https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups
  2. Create a new log group and log stream
  3. Click Action and Create log event
  4. Start Filebeat with filebeat.yml contains awscloudwatch input like below:
filebeat.inputs:
  - type: awscloudwatch
    credential_profile_name: elastic-beats
    log_group_arn: arn:aws:logs:us-east-1:428152502467:log-group:test:*
    region: us-east-1
    scan_frequency: 30s
    start_position: beginning
    api_timeout: 5m
  1. You should see logs in Kibana and if you create new log events in AWS CloudWatch, these new logs should be collected after the scan_frequency as well.

Related issues

closes #17292

Screenshots

When setting start_position to beginning, all existing logs will be collected and then Filebeat will scan every 30 seconds(based on the scan_frequency):
Screen Shot 2020-06-23 at 4 56 48 PM

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 6, 2020
@kaiyan-sheng kaiyan-sheng self-assigned this Jun 6, 2020
@kaiyan-sheng kaiyan-sheng added the in progress Pull request is currently in progress. label Jun 6, 2020
@andresrc andresrc added the Team:Platforms Label for the Integrations - Platforms team label Jun 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 8, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 15, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #19025 updated]

  • Start Time: 2020-07-01T19:45:44.592+0000

  • Duration: 35 min 17 sec

Test stats 🧪

Test Results
Failed 0
Passed 555
Skipped 128
Total 683

@kaiyan-sheng kaiyan-sheng added enhancement needs_backport PR is waiting to be backported to other branches. test-plan Add this PR to be manual test plan labels Jun 24, 2020
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this progressing! I did a first pass, I think we can better leverage nextToken when reading from the API, and avoid using timestamps for filtering

x-pack/filebeat/input/awscloudwatch/config.go Show resolved Hide resolved
x-pack/filebeat/input/awscloudwatch/config.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awscloudwatch/input.go Show resolved Hide resolved
x-pack/filebeat/input/awscloudwatch/input.go Show resolved Hide resolved
x-pack/filebeat/input/awscloudwatch/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awscloudwatch/input.go Show resolved Hide resolved
x-pack/filebeat/input/awscloudwatch/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awscloudwatch/input.go Outdated Show resolved Hide resolved
@kaiyan-sheng kaiyan-sheng added [zube]: In Review and removed [zube]: In Progress in progress Pull request is currently in progress. labels Jun 29, 2020
Comment on lines +23 to +25
ScanFrequency time.Duration `config:"scan_frequency" validate:"min=0,nonzero"`
APITimeout time.Duration `config:"api_timeout" validate:"min=0,nonzero"`
APISleep time.Duration `config:"api_sleep" validate:"min=0,nonzero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api_sleep and scan_frequency are very similar concepts with very different names here. Would it make sense to unify these a little bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_frequency defines the sleep time between this Filebeat input recheck for new logs
api_sleep defines the sleep time between each FilterLogEvents API call in the same Filebeat collection cycle.

How about scan_frequency and api_freqency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, let's keep api_sleep and make sure this is well documented

"awscloudwatch": common.MapStr{
"log_group": logGroup,
"log_stream": *logEvent.LogStreamName,
"ingestion_time": time.Unix(*logEvent.IngestionTime/1000, 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can be mapped to event.ingested https://www.elastic.co/guide/en/ecs/current/ecs-event.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event.ingested in ECS is the timestamp when an event arrived in the central data store. My understanding is this is the timestamp when event gets to Elasticsearch. But this ingestion_time is the time the event was ingested into AWS CloudWatch.

Maybe I understand event.ingested in ECS wrong? 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may be right here, let's keep current naming for now

@kaiyan-sheng kaiyan-sheng merged commit 07639fe into elastic:master Jul 1, 2020
@kaiyan-sheng kaiyan-sheng deleted the aws_cloudwatch_input branch July 1, 2020 21:15
@kaiyan-sheng kaiyan-sheng added v7.9.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jul 1, 2020
kaiyan-sheng added a commit that referenced this pull request Jul 2, 2020
* Add awscloudwatch filebeat input (#19025)

* Add awscloudwatch filebeat input
* Use log group ARN instead of log group name and region name
* add api_sleep, log_group_name and region_name config

(cherry picked from commit 07639fe)
@andresrc andresrc added test-plan-added This PR has been added to the test plan and removed [zube]: Done labels Jul 14, 2020
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
* Add awscloudwatch filebeat input
* Use log group ARN instead of log group name and region name
* add api_sleep, log_group_name and region_name config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement review Team:Platforms Label for the Integrations - Platforms team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Filebeat] Investigate adding cloudwatch input
4 participants