Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filling up logfile / IOError: No space left on device #3561

Closed
johnarnold opened this issue Jul 6, 2015 · 5 comments
Closed

Filling up logfile / IOError: No space left on device #3561

johnarnold opened this issue Jul 6, 2015 · 5 comments

Comments

@johnarnold
Copy link
Contributor

Running version 1.5.1-1

johnar@logr1west1:/etc/logstash/conf.d$ apt-cache policy logstash
logstash:
Installed: 1:1.5.1-1
Candidate: 1:1.5.1-1
Version table:
*** 1:1.5.1-1 0
100 /var/lib/dpkg/status

Disk appears to be getting filled up, every event is logging to the logfile.

cat logstash.err:

IOError: No space left on device
write at org/jruby/RubyIO.java:1412
write at org/jruby/RubyIO.java:2490
puts at org/jruby/RubyIO.java:2380
<< at /opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb:52
synchronize at org/jruby/ext/thread/Mutex.java:149
<< at /opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/outputs/io.rb:50
publish at /opt/logstash/vendor/bundle/jruby/1.9/gems/cabin-0.7.1/lib/cabin/channel.rb:176
each at org/jruby/RubyHash.java:1341
publish at /opt/logstash/vendor/bundle/jruby/1.9/g
synchronize at org/jruby/ext/thread/Mutex.java:149
publish at /opt/logstash/vendor/bundle/jruby/1.9/g
_log at /opt/logstash/vendor/bundle/jruby/1.9/g
log_with_level at /opt/logstash/vendor/bundle/jruby/1.9/g
warn at /opt/logstash/vendor/bundle/jruby/1.9/g
buffer_flush at /opt/logstash/vendor/bundle/jruby/1.9/g
buffer_flush at /opt/logstash/vendor/bundle/jruby/1.9/g
each at org/jruby/RubyHash.java:1341
buffer_flush at /opt/logstash/vendor/bundle/jruby/1.9/g
buffer_flush at /opt/logstash/vendor/bundle/jruby/1.9/g
buffer_receive at /opt/logstash/vendor/bundle/jruby/1.9/g
receive at /opt/logstash/vendor/bundle/jruby/1.9/g /outputs/elasticsearch.rb:455
handle at /opt/logstash/vendor/bundle/jruby/1.9/g :88
worker_setup at /opt/logstash/vendor/bundle/jruby/1.9/g :79

johnar@logr1west1:/var/log/logstash$ ls -lrt
total 26888176
-rw-r--r-- 1 root root 57 Jul 6 21:00 logstash.stdout
-rw-rw-rw- 1 logstash logstash 27533471744 Jul 6 21:01 logstash.log
-rw-r--r-- 1 root root 11029 Jul 6 21:01 logstash.err

Logstash options:

$ sudo ps ax | grep logstash

21963 pts/2 SNl 0:17 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/var/lib/logstash -Xmx500m -Xss2048k -Djffi.boot.library.path=/opt/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/var/lib/logstash -Xbootclasspath/a:/opt/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/opt/logstash/vendor/jruby -Djruby.lib=/opt/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /opt/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /etc/logstash/conf.d -l /var/log/logstash/logstash.log -w 6

Sanitized config:
http://termbin.com/95v7

@jordansissel
Copy link
Contributor

every event is logging to the logfile.

Can you show me some examples of these?

While I wait, here are some initial thoughts given the data you provided:

  • Possible: You may be using output { stdout { ... } } and not realizing it. I see this commented out in your provided config, but your Logstash uses -f /etc/logstash/conf.d which I assume is a directory full of configs - one of them may have this stdout output.
  • Unlikely: You may have Logstash configured with debug or verbose logging. I do not see this in your ps output though (I looked for --debug, --verbose, and -v flags)

@johnarnold
Copy link
Contributor Author

Jordan, I think this is related to:
logstash-plugins/logstash-output-elasticsearch#144

I'm getting logs like:
{:timestamp=>"2015-07-06T23:32:37.791000+0000", :message=>"failed action with response of 400, dropping action: ["i
ndex", {:_id=>nil, :_index=>"logstash-2015.07.06", :_type=>"logs", :_routing=>nil}, #<LogStash::Event:0x53a3fd3
2 @metadata_accessors=#<LogStash::Util::Accessors:0x510314e5 @store={"retry_count"=>0}, @lut={}>, @cancelled=false
, @DaTa={"message"=>"logger: [ssl_req][06/Jul/2015:23:16:21 +0000] 64.39.105.54 TLSv1 DHE-RSA-AES256-SHA \"GET
/2014/users/10/shipping_addresses HTTP/1.1\" 1296", "@Version"=>"1", "@timestamp"=>"2015-07-06T23:16:21.00
0Z", "source"=>"syslog", "host"=>"bn3-f89-bspos-7b", "path"=>"/var/log/firehose.log", "host_ip"=>"10
....etc

The config file is a symlink up one directory:
$ ls -l /etc/logstash/conf.d
total 0
lrwxrwxrwx 1 root root 16 Jun 26 21:13 logstash.conf -> ../logstash.conf

@jordansissel
Copy link
Contributor

Ahh. I'm not sure your problem is logstash-plugins/logstash-output-elasticsearch#144 at this time - that ticket is very long and has lots of mixed reports.

Focusing just on your issue, the error (400) comes from ES because it is rejecting something about your data - what, I don't know at this time. If possible, see what Elasticsearch's logs have to say? Alternately, maybe sniff the http response coming from Elasticsearch?

@johnarnold
Copy link
Contributor Author

Good advice... found this on one of the masters:

[2015-07-06 23:45:42,080][WARN ][cluster.action.shard ] [logmaster2] [logstash-2015.07.06][3] received shard failed for [logstash-2015.07.06][3], node[srkzaGS5TNiN8H7JWfpZdA], [R], s[INITIALIZING], indexUUID [MaXO6tx7TDO7g74ObXkfxg], reason [shard failure [failed recovery][RecoveryFailedException[[logstash-2015.07.06][3]: Recovery failed from [logdata5][x8kENbNRSiCki0JaMp2uow][logdata5][inet[/10.80.80.11:9300]]{master=false} into [logdata9][srkzaGS5TNiN8H7JWfpZdA][logdata9][inet[/10.80.80.15:9300]]{master=false}]; nested: RemoteTransportException[[logdata5][inet[/10.80.80.11:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2015.07.06][3] Phase[2] Execution failed]; nested: RemoteTransportException[[logdata9][inet[/10.80.80.15:9300]][internal:index/shard/recovery/translog_ops]]; nested: MapperParsingException[failed to parse [device_timestamp]]; nested: MapperParsingException[failed to parse date field [Jul 6 23:33:11], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "Jul 6 23:33:11"]; ]]

This is weird because according to the index mapping, device_timestamp should be a string:
device_timestamp string

Assuming there is a fixable mapping error here, it would be nice to handle more gracefully in logstash rather than spamming the logfile. I'll see if I can fix it and then update the issue.

@johnarnold
Copy link
Contributor Author

Root case:
Elasticsearch was unable to index certain device_timestamp fields due to a conflict between the dynamically mapped template (which cast these as datetime) and the received field (which could be a really ugly string). This caused Elasticsearch to return HTTP 400 codes to logstash, which mishandeled these events and dumped them to the logfile with no indication of what was wrong. The logfile would fill up the disk, eventually causing logstash to die (and not restart successfully).

Resolution:

es-template.txt has been updated to hardcode “device_timestamp” field as a string:

    "device_timestamp" : { "type": "string", "index": "not_analyzed" }

This resolves indexing errors seen on the master due to invalid mapping of events:
[2015-07-06 23:45:42,080][WARN ][cluster.action.shard ] [logmaster2] [logstash-2015.07.06][3] received shard failed for [logstash-2015.07.06][3], node[srkzaGS5TNiN8H7JWfpZdA], [R], s[INITIALIZING], indexUUID [MaXO6tx7TDO7g74ObXkfxg], reason [shard failure [failed recovery][RecoveryFailedException[[logstash-2015.07.06][3]: Recovery failed from [logdata5][x8kENbNRSiCki0JaMp2uow][logdata5][inet[/10.80.80.11:9300]]{master=false} into [logdata9][srkzaGS5TNiN8H7JWfpZdA][logdata9][inet[/10.80.80.15:9300]]{master=false}]; nested: RemoteTransportException[[logdata5][inet[/10.80.80.11:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2015.07.06][3] Phase[2] Execution failed]; nested: RemoteTransportException[[logdata9][inet[/10.80.80.15:9300]][internal:index/shard/recovery/translog_ops]]; nested: MapperParsingException[failed to parse [device_timestamp]]; nested: MapperParsingException[failed to parse date field [Jul 6 23:33:11], tried both date format [dateOptionalTime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid format: "Jul 6 23:33:11"]; ]]

I had to nuke today’s index in order to recreate the mappings. Seems to be running pretty smoothly now.

I think the plugin should not spam the log for mapping problems like this (at least with debug off). I'll close this case and comment on the other open case under the plugin. Thanks again Jordan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants