Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) #9346

Closed
marioplumbarius opened this issue Apr 10, 2018 · 27 comments · Fixed by logstash-plugins/logstash-input-udp#39
Assignees

Comments

@marioplumbarius
Copy link

marioplumbarius commented Apr 10, 2018

  • Version:
    6.2.3
  • Operating System:
    RHEL
    jruby 9.1.13.0 (2.3.3) 2017-09-06 8e1c115 OpenJDK 64-Bit Server VM 25.162-b12 on 1.8.0_162-b12 +jit [linux-x86_64]
  • Config File (if you have sensitive info, please remove it):
input {
  udp {
    port => 55514
    type => syslog
    queue_size => 20000
    receive_buffer_bytes => 134217728
    workers => 2
    codec => plain {
      charset => "ISO-8859-1"
    }
  }
}

filter {}
output {}

  • When I remove the following settings, the issue disappears.
# Turn on JRuby invokedynamic
-Djruby.compile.invokedynamic=true
# Force Compilation
-Djruby.jit.threshold=0
  • Sample Data:
"{\"type\":\"syslog\",\"@timestamp\":\"2018-04-09T15:46:03.071Z\",\"host\":\"127.0.0.1\",\"message\":\"<38>2018-04-09T15:46:03 localhost prg00000[1234]: seq: 0000057360, thread: 0000, runid: 1523288758, stamp: 2018-04-09T15:46:03 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADD\",\"@version\":\"1\"}"
  • Steps to Reproduce:
  1. Start Logstash
  2. Send 10k sample data per second
  3. Logstash is going to crash and restart

This same piece of come used to handle up to 40k events per second.

UPDATE: I've just confirmed the issue raises even when only the udp input is set.

@andrewvc
Copy link
Contributor

@marioluan I take it this started after an upgrade? Which version were you using previously?

@marioplumbarius
Copy link
Author

marioplumbarius commented Apr 10, 2018 via email

@andrewvc
Copy link
Contributor

Hmmmm, I had trouble reproducing this locally with 6.2.3. See the below yourkit perf charts. The heap looks quite healthy.

image

I'm sending data with: while true; echo sampledata | nc -c -u localhost 55514; end (using fish shell).

How are you determining that the heap is full?

@marioplumbarius
Copy link
Author

marioplumbarius commented Apr 10, 2018

@andrewvc

1. Regarding the heap size, the following error message is shown at the logs:

[2018-04-09T15:41:42,331][ERROR][org.logstash.Logstash    ] java.lang.OutOfMemoryError: Java heap space

2. How many messages per second could you send using while true?
I use to generate syslog messages by using loggen.

The way I could reproduce the issue was by running loggen with the following settings:

loggen --dgram --size=300 --interval=300 --rate=10000 localhost 55514

@andrewvc
Copy link
Contributor

Would you mind sharing a heap dump? https://dzone.com/articles/memory-analysis-how-to-obtain-java-heat-dump

I'll try and repro on a linux box, loggen isn't available on OSX AFAICT

@marioplumbarius
Copy link
Author

@andrewvc sent you an email.

Let me know when you have the linux box results in place.

@marioplumbarius
Copy link
Author

@andrewvc I included a grok filter to that config and got different error messages:
Filter:

grok {
    "tag_on_failure" => []
    "match" => { "message" => "^<%{NONNEGINT}>%{SYSLOGTIMESTAMP}\[%{POSINT}\]: %{POSINT:heartbeat_ts_ms:int}$" }
  }

Errors:

[2018-04-12T14:53:18,420][ERROR][logstash.filters.grok    ] Error while attempting to check/cancel excessively long grok patterns {:message=>"Java heap space", :class=>"Java::JavaLang::OutOfMemoryError", :backtrace=>[]}

[2018-04-12T14:55:24,603][ERROR][org.logstash.Logstash    ] java.lang.IllegalArgumentException: Self-suppression not permitted

PS: This grok pattern runs OK in 5.6.3.

@marioplumbarius
Copy link
Author

Another thing I noticed: the issue is intermittent, that is, after restarts it can happen (or not).

@andrewvc
Copy link
Contributor

@marioluan what configurations options are you passing into LS? I'm especially interested in the batch size and the number of workers. If you aren't setting the number of workers, how many cores are you using?

@andrewvc
Copy link
Contributor

andrewvc commented Apr 18, 2018

Looking at the heap dump I see I lot of RubyString objects, ~93,400 of them. They appear to be 65535 bytes apiece.

The weird thing here is that the Ruby strings are not really that long, they're zero padded to reach that size. The actual message is only 300 bytes. So, that's quite a bit of overhead.

I was wondering if it is the case that your batch_size*num_workers > 93400, and that maybe you were sending too many large messages, but it seems like these Ruby String objects are just many times larger than they need to be.

@andrewvc
Copy link
Contributor

Also, which queue are you using? Memory or persistent?

@marioplumbarius
Copy link
Author

marioplumbarius commented Apr 18, 2018

@andrewvc thanks for taking a look at this.

This is the settings from logstash.yml:

pipeline.workers: 8
pipeline.batch.size: 500
metric.collect: true

I'm also setting heap space to 1GB in jvm.options:

-Xms1g
-Xmx1g

And this is the packet I'm sendint to the UDP listener:

{
  "@timestamp": "2018-04-09T15:46:03.071Z",
  "@version": "1"​,
  "host": "127.0.0.1",
  "message": "<38>2018-04-09T15:46:03 localhost prg00000[1234]: 
  seq: 0000057360, thread: 0000, runid: 1523288758, stamp: 
  2018-04-09T15:46:03 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPAD
  DPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPA
  DDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADD",
  "type": "syslog"
}

And I'm using default config for queues.

@praseodym
Copy link

The UDP plugin is likely the cause of the 65535 byte strings you are seeing:

https://github.com/logstash-plugins/logstash-input-udp/blob/cf26a0561e186544b079522e58ed5aa1c8705949/lib/logstash/inputs/udp.rb#L26

@marioplumbarius marioplumbarius changed the title LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) LS 6.2.3: JVM heap size filling up Apr 18, 2018
@marioplumbarius marioplumbarius changed the title LS 6.2.3: JVM heap size filling up LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) Apr 18, 2018
@jsvd
Copy link
Member

jsvd commented Apr 18, 2018

Since logstash 5.6.3 used logstash-input-udp 3.1.2 and you're able to replicate this with some consistency, it may be worth downgrading the plugin: bin/logstash-plugin install --version 3.1.2 logstash-input-udp so that we can exclude any recent changes to the udp input

@andrewvc
Copy link
Contributor

This is definitely a JRuby behavior issue. Not necessarily a bug. It looks like we (or the ruby UDP lib) is allocating 64k for the read, then trimming the string to actual length. Unfortunately JRuby tries to optimize this by just changing the realSize attribute of the string internally, preventing a copy, but this prevents most of the buffer from being freed.

It'll take a little looking to see where the UDP lib is doing that, but it should be fixable by re-allocating the string.

@praseodym
Copy link

Decreasing the buffer size (a logstash-input-udp option) to your actual maximum message size (1500 bytes?) should be a quick workaround.

@IrlJidel
Copy link
Contributor

A syslog message has to fit into one UDP packet so we'll set to 1452.

Guess change in jruby behavior would explain why we don't have this issue when we benchmark 5.6.

5.6 uses 1.7.27 while 6.2 uses jruby-9.1.13.0

@IrlJidel
Copy link
Contributor

Thanks for the tip.

Setting buffer_size => 1452 in logstash 5.6.3 increased perf from ~42k msgs/sec to ~50k msgs/sec.

20% increase!

@original-brownbear
Copy link
Member

@andrewvc thought about this, and a possible hack/fix to get the RubyString to resize here would be to call .b on it here https://github.com/logstash-plugins/logstash-input-udp/blob/master/lib/logstash/inputs/udp.rb#L121.

Then we'd hand a properly resized (in terms of it's underlying BytesList) RubyString down (under the hood .b creates a copy with a correctly resized BytesList). The codec shouldn't have any trouble dealing with the ASCII-8BIT encoding that .b forces imo since data read from a UDP socket should be ASCII-8BIT in the first place.

@andrewvc
Copy link
Contributor

I took a first pass at just porting the UDP input to use the java APIs for UDP directly. You can find the result here. logstash-plugins/logstash-input-udp#38

@marioluan if you have a chance to try that out and benchmark it that'd be awesome. If you checkout that folder and add replace the UDP input entry in your gemfile with:

gem "logstash-input-udp", :path => "/path/to/logstash-input-udp" , then run bin/logstash-plugin install --no-verify you can give it a shot.

@IrlJidel
Copy link
Contributor

IrlJidel commented Apr 23, 2018

tested on LS 5.6.3.

Tests were run for 15minutes with 300 byte message.

buffer_size Patch 3.3.2
1K 52k eps 51k eps
64K 51k eps 40k eps *
  • 3.3.2 with 64K buffer_size udp->main cpu usage was 92%, for all other cases was 46%

@IrlJidel
Copy link
Contributor

As String.b doesn't appear to be avilable in jruby I tried this instead

@input_to_worker.push([payload.force_encoding('ascii-8bit'), client])

buffer_size ascii-8bit 3.3.2
1K 48k eps 51k eps
64K 42k eps 40k eps

@jsvd
Copy link
Member

jsvd commented Apr 23, 2018

@IrlJidel that seems to be the same as doing input { udp { port => 3333 codec { plain { charset => "ASCII-8BIT" } } }, can you tested that? Also, can you run the benchmark with packets of size 30k ? and buffer size 64k?

@IrlJidel
Copy link
Contributor

IrlJidel commented Apr 23, 2018

btw I ran standard plugin on 6.2.3 with buffer_size => 1024 and turned back on -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0

I got 64k eps!

@IrlJidel
Copy link
Contributor

@jsvd We already use a codec

I was just trying to use payload.force_encoding('ascii-8bit') as payload.b wasn't supported to try out @original-brownbear suggestion

 codec => plain {
      charset => "ISO-8859-1"
    }

@jsvd
Copy link
Member

jsvd commented Apr 23, 2018

The .b essentially gives you a ASCII copy of the string, so setting the charset to that should have the same effect

@jsvd
Copy link
Member

jsvd commented Apr 23, 2018

My findings so far:

  • This happens only on jruby 9k, therefore only on Logstash >= 6.x
  • JIT flags aren't relevant here, I can reproduce with and without them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants