Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat multiple large files high cpu 300% #2786

Closed
githubnovee opened this issue Oct 14, 2016 · 6 comments
Closed

filebeat multiple large files high cpu 300% #2786

githubnovee opened this issue Oct 14, 2016 · 6 comments

Comments

@githubnovee
Copy link

githubnovee commented Oct 14, 2016

  • Version: filebeat version 5.0.0-rc1
  • Operating System: centos 6.2_64
  • Steps to Reproduce: send log to kafka
    What's happen about filebeat version 5.0.0-rc1 (amd64), libbeat 5.0.0-rc1? what's problem in my config。 log ,read files as following:

filebeat.yml:

filebeat.prospectors:
 input_type: log
  paths:
    - *.*
  encoding: utf-8
  fields: {"id": "455","name": "talk","set": "apple","host": "10.10.10.10","module": "tags"}
  fields_under_root: false
  ignore_older: 5m
  document_type: log
  scan_frequency: 10s
  harvester_buffer_size: 16384
  max_bytes: 10485760
  tail_files: true
  backoff: 1s
  max_backoff: 10s
  backoff_factor: 2
  harvester_limit: 5
  close_inactive: 5m
  close_renamed: true
  close_removed: true
  close_eof: true
filebeat.spool_size: 20480
filebeat.idle_timeout: 5s
filebeat.registry_file: /data/home/user00/filebeat/registry.log
filebeat.shutdown_timeout: 0
queue_size: 1000
bulk_queue_size: 100
max_procs: 4
output.kafka:
  hosts: ["10.10.10.11:9092","10.10.10.12:9092","10.10.10.13:9092"]
  topic: 455
  metadata:
    retry.max: 3
    retry.backoff: 250ms
    refresh_frequency: 10m
  worker: 1
  max_retries: 3
  bulk_max_size: 20480
  timeout: 30s
  broker_timeout: 10s
  channel_buffer_size: 256
  keep_alive: 0
  compression: gzip
  max_message_bytes: 1000000
  flush_interval: 1s
logging.level: info
logging.to_syslog: false
logging.metrics.period: 30s
logging.to_files: true
logging.files:
  path: /data/home/user00/filebeat/logs/
  name: filebeat.log
  keepfiles: 7

log:

2016-10-14T17:35:49+08:00 INFO Metrics logging every 30s
2016-10-14T17:35:49+08:00 INFO Harvester started for file: 20161014_1700.log.3
2016-10-14T17:35:49+08:00 INFO End of file reached: 20161014_1700.log.3. Closing because close_eof is enabled.
2016-10-14T17:35:49+08:00 INFO Harvester started for file: 20161014_1700.log.4
2016-10-14T17:35:49+08:00 INFO End of file reached: 20161014_1700.log.4. Closing because close_eof is enabled.
2016-10-14T17:35:54+08:00 WARN kafka message: Initializing new client
2016-10-14T17:35:54+08:00 WARN client/metadata fetching metadata for all topics from broker 10.231.22.206:9092
2016-10-14T17:35:54+08:00 WARN Connected to broker at 10.10.10.11:9092 (unregistered)
2016-10-14T17:35:54+08:00 WARN client/brokers registered new broker #2 at 10.10.10.11:9092
2016-10-14T17:35:54+08:00 WARN client/brokers registered new broker #1 at 10.10.10.12:9092
2016-10-14T17:35:54+08:00 WARN client/brokers registered new broker #3 at 10.10.10.13:9092
2016-10-14T17:35:54+08:00 WARN kafka message: Successfully initialized new client
2016-10-14T17:35:54+08:00 WARN producer/broker/2 starting up
2016-10-14T17:35:54+08:00 WARN producer/broker/2 state change to [open] on 455/1
2016-10-14T17:35:54+08:00 WARN producer/broker/1 starting up
2016-10-14T17:35:54+08:00 WARN producer/broker/1 state change to [open] on 455/0
2016-10-14T17:35:54+08:00 WARN producer/broker/3 starting up
2016-10-14T17:35:54+08:00 WARN producer/broker/3 state change to [open] on 455/2
2016-10-14T17:35:54+08:00 WARN Connected to broker at 10.185.23.207:9092 (registered as #2)
2016-10-14T17:35:54+08:00 WARN Connected to broker at 10.241.124.196:9092 (registered as #3)
2016-10-14T17:35:54+08:00 WARN Connected to broker at 10.231.22.206:9092 (registered as #1)
2016-10-14T17:35:59+08:00 INFO Harvester started for file: 20161014_1700.log.4
2016-10-14T17:36:19+08:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.started=3 libbeat.kafka.published_and_acked_events=163850 publish.events
=163855 registar.states.current=2 filebeat.harvester.open_files=1 libbeat.publisher.published_events=184330 filebeat.harvester.running=1 libbeat.kafka.call_
count.PublishEvents=10 registrar.states.update=163855 filebeat.harvester.closed=2 registrar.writes=9

2016-10-14T17:36:49+08:00 INFO Non-zero metrics in the last 30s: registrar.writes=14 libbeat.publisher.published_events=266240 publish.events=286720 
libbeat.kafka.call_count.PublishEvents=13 libbeat.kafka.published_and_acked_events=286720 registrar.states.update=286720

2016-10-14T17:37:19+08:00 INFO Non-zero metrics in the last 30s: registrar.states.update=266240 libbeat.kafka.published_and_acked_events=266240 
libbeat.kafka.call_count.PublishEvents=14 libbeat.publisher.published_events=286720 publish.events=266240 registrar.writes=13

read files:

-rw-rw-rw- 1 tlog users 977M 2016-10-14 16:10 20161014_1600.log
-rw-rw-rw- 1 tlog users 977M 2016-10-14 16:20 20161014_1600.log.1
-rw-rw-rw- 1 tlog users 977M 2016-10-14 16:31 20161014_1600.log.2
-rw-rw-rw- 1 tlog users 977M 2016-10-14 16:41 20161014_1600.log.3
-rw-rw-rw- 1 tlog users 977M 2016-10-14 16:50 20161014_1600.log.4
-rw-rw-rw- 1 tlog users 977M 2016-10-14 16:57 20161014_1600.log.5
-rw-rw-rw- 1 tlog users 303M 2016-10-14 16:59 20161014_1600.log.6
@ruflin
Copy link
Contributor

ruflin commented Oct 17, 2016

@githubnovee On startup the high CPU usage is kind of expected as filebeat tries to send the events as fast as possible. Is the load going down again after filebeat sent all the data and only has to send new events?

@tsg
Copy link
Contributor

tsg commented Oct 17, 2016

The logs indicate it's processing around 9.5k to 10k per second. That's not too bad, but for 300% CPU I'd have expected more, considering that there's no multiline and no filtering involved.

@githubnovee Are you looking to reduce the CPU usage (accepting slower speed)? You could cap it to 100% by setting max_procs: 1. Or you are simply expecting better performance?

@tsg
Copy link
Contributor

tsg commented Oct 17, 2016

@githubnovee As a first basic test, it would be interesting to make a test on the same data with the file output instead of the kafka one, to see if the bottleneck is in Filebeat reading the files or in sending them to Kafka. Thanks.

@githubnovee
Copy link
Author

@tsg I adjusted CPUs by seetting max_procs :1 , CPU load looks normal, but the amount of the log is small.

logs as following:
2016-10-18T10:50:32+08:00 INFO Non-zero metrics in the last 10s: publish.events=122880 registrar.writes=6 libbeat.kafka.call_count.PublishEvents=6 libbeat.kafka.published_and_acked_events=42264 registrar.states.update=122880 libbeat.publisher.published_events=43032
2016-10-18T10:50:42+08:00 INFO Non-zero metrics in the last 10s: libbeat.kafka.call_count.PublishEvents=5 registrar.states.update=102400 libbeat.kafka.published_and_acked_events=35870 registrar.writes=5 libbeat.publisher.published_events=35661 publish.events=102400
2016-10-18T10:50:52+08:00 INFO Non-zero metrics in the last 10s: registrar.states.update=102400 publish.events=102400 registrar.writes=5 libbeat.kafka.call_count.PublishEvents=5 libbeat.kafka.published_and_acked_events=36358 libbeat.publisher.published_events=36222

@tsg
Copy link
Contributor

tsg commented Oct 18, 2016

So you get 3-4K per second, which confirms in a way that Filebeat is CPU bound. Can you try with the file output instead of the Kafka output, please? It will help us get a first feeling of where the bottleneck is.

If you can, we'd also appreciate some profiling information that you can get like this:

  • start Filebeat with the debugging proxy started, for example:
filebeat -e -httpprof 127.0.0.1:6060
  • (optionally from a second system) Start a profiling session in another terminal. It's important to have Filebeat busy during this time:
go tool pprof http://localhost:6060/debug/pprof/profile
  • At the prompt, type:
png > profiling.png

This will create a profiling.png dot image, which you can attach to this ticket. Let me know if you hit any issues with the steps above. There are also other options for profiling.

@ruflin
Copy link
Contributor

ruflin commented Nov 28, 2017

Closing this as no feedback was received in the last 12 month and lots of improvements have been made since then on filebeat. @githubnovee Feel free to reopen if this is still an issue.

@ruflin ruflin closed this as completed Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants