LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) #9346

marioplumbarius · 2018-04-10T10:56:10Z

Version:
6.2.3
Operating System:
RHEL
jruby 9.1.13.0 (2.3.3) 2017-09-06 8e1c115 OpenJDK 64-Bit Server VM 25.162-b12 on 1.8.0_162-b12 +jit [linux-x86_64]
Config File (if you have sensitive info, please remove it):

input {
  udp {
    port => 55514
    type => syslog
    queue_size => 20000
    receive_buffer_bytes => 134217728
    workers => 2
    codec => plain {
      charset => "ISO-8859-1"
    }
  }
}

filter {}
output {}

When I remove the following settings, the issue disappears.

# Turn on JRuby invokedynamic
-Djruby.compile.invokedynamic=true
# Force Compilation
-Djruby.jit.threshold=0

Sample Data:

"{\"type\":\"syslog\",\"@timestamp\":\"2018-04-09T15:46:03.071Z\",\"host\":\"127.0.0.1\",\"message\":\"<38>2018-04-09T15:46:03 localhost prg00000[1234]: seq: 0000057360, thread: 0000, runid: 1523288758, stamp: 2018-04-09T15:46:03 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADD\",\"@version\":\"1\"}"

Steps to Reproduce:

Start Logstash
Send 10k sample data per second
Logstash is going to crash and restart

This same piece of come used to handle up to 40k events per second.

UPDATE: I've just confirmed the issue raises even when only the udp input is set.

The text was updated successfully, but these errors were encountered:

andrewvc · 2018-04-10T11:55:53Z

@marioluan I take it this started after an upgrade? Which version were you using previously?

marioplumbarius · 2018-04-10T12:37:53Z

@andrewvc Yes, it started after an upgrade from LS 5.6.3.

andrewvc · 2018-04-10T18:12:19Z

Hmmmm, I had trouble reproducing this locally with 6.2.3. See the below yourkit perf charts. The heap looks quite healthy.

I'm sending data with: while true; echo sampledata | nc -c -u localhost 55514; end (using fish shell).

How are you determining that the heap is full?

marioplumbarius · 2018-04-10T18:44:10Z

@andrewvc

1. Regarding the heap size, the following error message is shown at the logs:

[2018-04-09T15:41:42,331][ERROR][org.logstash.Logstash    ] java.lang.OutOfMemoryError: Java heap space

2. How many messages per second could you send using while true?
I use to generate syslog messages by using loggen.

The way I could reproduce the issue was by running loggen with the following settings:

loggen --dgram --size=300 --interval=300 --rate=10000 localhost 55514

andrewvc · 2018-04-10T23:33:17Z

Would you mind sharing a heap dump? https://dzone.com/articles/memory-analysis-how-to-obtain-java-heat-dump

I'll try and repro on a linux box, loggen isn't available on OSX AFAICT

marioplumbarius · 2018-04-11T19:00:35Z

@andrewvc sent you an email.

Let me know when you have the linux box results in place.

marioplumbarius · 2018-04-12T14:59:24Z

@andrewvc I included a grok filter to that config and got different error messages:
Filter:

grok {
    "tag_on_failure" => []
    "match" => { "message" => "^<%{NONNEGINT}>%{SYSLOGTIMESTAMP}\[%{POSINT}\]: %{POSINT:heartbeat_ts_ms:int}$" }
  }

Errors:

[2018-04-12T14:53:18,420][ERROR][logstash.filters.grok    ] Error while attempting to check/cancel excessively long grok patterns {:message=>"Java heap space", :class=>"Java::JavaLang::OutOfMemoryError", :backtrace=>[]}

[2018-04-12T14:55:24,603][ERROR][org.logstash.Logstash    ] java.lang.IllegalArgumentException: Self-suppression not permitted

PS: This grok pattern runs OK in 5.6.3.

marioplumbarius · 2018-04-13T08:29:57Z

Another thing I noticed: the issue is intermittent, that is, after restarts it can happen (or not).

andrewvc · 2018-04-18T19:15:37Z

@marioluan what configurations options are you passing into LS? I'm especially interested in the batch size and the number of workers. If you aren't setting the number of workers, how many cores are you using?

andrewvc · 2018-04-18T20:48:58Z

Looking at the heap dump I see I lot of RubyString objects, ~93,400 of them. They appear to be 65535 bytes apiece.

The weird thing here is that the Ruby strings are not really that long, they're zero padded to reach that size. The actual message is only 300 bytes. So, that's quite a bit of overhead.

I was wondering if it is the case that your batch_size*num_workers > 93400, and that maybe you were sending too many large messages, but it seems like these Ruby String objects are just many times larger than they need to be.

andrewvc · 2018-04-18T20:57:47Z

Also, which queue are you using? Memory or persistent?

marioplumbarius · 2018-04-18T21:31:16Z

@andrewvc thanks for taking a look at this.

This is the settings from logstash.yml:

pipeline.workers: 8
pipeline.batch.size: 500
metric.collect: true

I'm also setting heap space to 1GB in jvm.options:

-Xms1g
-Xmx1g

And this is the packet I'm sendint to the UDP listener:

{
  "@timestamp": "2018-04-09T15:46:03.071Z",
  "@version": "1",
  "host": "127.0.0.1",
  "message": "<38>2018-04-09T15:46:03 localhost prg00000[1234]: 
  seq: 0000057360, thread: 0000, runid: 1523288758, stamp: 
  2018-04-09T15:46:03 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPAD
  DPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPA
  DDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADD",
  "type": "syslog"
}

And I'm using default config for queues.

praseodym · 2018-04-18T21:39:32Z

The UDP plugin is likely the cause of the 65535 byte strings you are seeing:

https://github.com/logstash-plugins/logstash-input-udp/blob/cf26a0561e186544b079522e58ed5aa1c8705949/lib/logstash/inputs/udp.rb#L26

jsvd · 2018-04-18T21:52:33Z

Since logstash 5.6.3 used logstash-input-udp 3.1.2 and you're able to replicate this with some consistency, it may be worth downgrading the plugin: bin/logstash-plugin install --version 3.1.2 logstash-input-udp so that we can exclude any recent changes to the udp input

andrewvc · 2018-04-19T17:57:44Z

This is definitely a JRuby behavior issue. Not necessarily a bug. It looks like we (or the ruby UDP lib) is allocating 64k for the read, then trimming the string to actual length. Unfortunately JRuby tries to optimize this by just changing the realSize attribute of the string internally, preventing a copy, but this prevents most of the buffer from being freed.

It'll take a little looking to see where the UDP lib is doing that, but it should be fixable by re-allocating the string.

praseodym · 2018-04-19T18:14:58Z

Decreasing the buffer size (a logstash-input-udp option) to your actual maximum message size (1500 bytes?) should be a quick workaround.

IrlJidel · 2018-04-19T19:10:39Z

A syslog message has to fit into one UDP packet so we'll set to 1452.

Guess change in jruby behavior would explain why we don't have this issue when we benchmark 5.6.

5.6 uses 1.7.27 while 6.2 uses jruby-9.1.13.0

IrlJidel · 2018-04-19T21:26:16Z

Thanks for the tip.

Setting buffer_size => 1452 in logstash 5.6.3 increased perf from ~42k msgs/sec to ~50k msgs/sec.

20% increase!

original-brownbear · 2018-04-20T06:43:09Z

@andrewvc thought about this, and a possible hack/fix to get the RubyString to resize here would be to call .b on it here https://github.com/logstash-plugins/logstash-input-udp/blob/master/lib/logstash/inputs/udp.rb#L121.

Then we'd hand a properly resized (in terms of it's underlying BytesList) RubyString down (under the hood .b creates a copy with a correctly resized BytesList). The codec shouldn't have any trouble dealing with the ASCII-8BIT encoding that .b forces imo since data read from a UDP socket should be ASCII-8BIT in the first place.

andrewvc · 2018-04-20T22:18:55Z

I took a first pass at just porting the UDP input to use the java APIs for UDP directly. You can find the result here. logstash-plugins/logstash-input-udp#38

@marioluan if you have a chance to try that out and benchmark it that'd be awesome. If you checkout that folder and add replace the UDP input entry in your gemfile with:

gem "logstash-input-udp", :path => "/path/to/logstash-input-udp" , then run bin/logstash-plugin install --no-verify you can give it a shot.

IrlJidel · 2018-04-23T11:45:54Z

tested on LS 5.6.3.

Tests were run for 15minutes with 300 byte message.

buffer_size	Patch	3.3.2
1K	52k eps	51k eps
64K	51k eps	40k eps *

3.3.2 with 64K buffer_size udp->main cpu usage was 92%, for all other cases was 46%

IrlJidel · 2018-04-23T15:35:21Z

As String.b doesn't appear to be avilable in jruby I tried this instead

@input_to_worker.push([payload.force_encoding('ascii-8bit'), client])

buffer_size	ascii-8bit	3.3.2
1K	48k eps	51k eps
64K	42k eps	40k eps

jsvd · 2018-04-23T16:24:22Z

@IrlJidel that seems to be the same as doing input { udp { port => 3333 codec { plain { charset => "ASCII-8BIT" } } }, can you tested that? Also, can you run the benchmark with packets of size 30k ? and buffer size 64k?

IrlJidel · 2018-04-23T16:58:28Z

btw I ran standard plugin on 6.2.3 with buffer_size => 1024 and turned back on -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0

I got 64k eps!

IrlJidel · 2018-04-23T17:01:14Z

@jsvd We already use a codec

I was just trying to use payload.force_encoding('ascii-8bit') as payload.b wasn't supported to try out @original-brownbear suggestion

 codec => plain {
      charset => "ISO-8859-1"
    }

jsvd · 2018-04-23T17:16:05Z

The .b essentially gives you a ASCII copy of the string, so setting the charset to that should have the same effect

jsvd · 2018-04-23T17:50:29Z

My findings so far:

This happens only on jruby 9k, therefore only on Logstash >= 6.x
JIT flags aren't relevant here, I can reproduce with and without them

andrewvc assigned original-brownbear Apr 16, 2018

marioplumbarius changed the title ~~LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set)~~ LS 6.2.3: JVM heap size filling up Apr 18, 2018

marioplumbarius changed the title ~~LS 6.2.3: JVM heap size filling up~~ LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) Apr 18, 2018

andrewvc mentioned this issue Apr 20, 2018

WIP Use java DatagramSocket directly logstash-plugins/logstash-input-udp#38

Open

jsvd mentioned this issue Apr 23, 2018

Data read from a UDP socket retains buffer size causing large memory overhead jruby/jruby#5148

Closed

jsvd mentioned this issue May 2, 2018

Work around memory issue with jruby/jruby#5148 logstash-plugins/logstash-input-udp#39

Merged

jsvd closed this as completed in logstash-plugins/logstash-input-udp#39 May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) #9346

LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) #9346

marioplumbarius commented Apr 10, 2018 •

edited

Loading

andrewvc commented Apr 10, 2018

marioplumbarius commented Apr 10, 2018 via email •

edited

Loading

andrewvc commented Apr 10, 2018

marioplumbarius commented Apr 10, 2018 •

edited

Loading

andrewvc commented Apr 10, 2018

marioplumbarius commented Apr 11, 2018

marioplumbarius commented Apr 12, 2018

marioplumbarius commented Apr 13, 2018

andrewvc commented Apr 18, 2018

andrewvc commented Apr 18, 2018 •

edited

Loading

andrewvc commented Apr 18, 2018

marioplumbarius commented Apr 18, 2018 •

edited

Loading

praseodym commented Apr 18, 2018

jsvd commented Apr 18, 2018 •

edited

Loading

andrewvc commented Apr 19, 2018

praseodym commented Apr 19, 2018

IrlJidel commented Apr 19, 2018

IrlJidel commented Apr 19, 2018

original-brownbear commented Apr 20, 2018

andrewvc commented Apr 20, 2018

IrlJidel commented Apr 23, 2018 •

edited

Loading

IrlJidel commented Apr 23, 2018

jsvd commented Apr 23, 2018

IrlJidel commented Apr 23, 2018 •

edited

Loading

IrlJidel commented Apr 23, 2018

jsvd commented Apr 23, 2018

jsvd commented Apr 23, 2018

LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) #9346

LS 6.2.3: JVM heap size filling up with default jvm.options (invokedynamic and jit.threshold is set) #9346

Comments

marioplumbarius commented Apr 10, 2018 • edited Loading

andrewvc commented Apr 10, 2018

marioplumbarius commented Apr 10, 2018 via email • edited Loading

andrewvc commented Apr 10, 2018

marioplumbarius commented Apr 10, 2018 • edited Loading

andrewvc commented Apr 10, 2018

marioplumbarius commented Apr 11, 2018

marioplumbarius commented Apr 12, 2018

marioplumbarius commented Apr 13, 2018

andrewvc commented Apr 18, 2018

andrewvc commented Apr 18, 2018 • edited Loading

andrewvc commented Apr 18, 2018

marioplumbarius commented Apr 18, 2018 • edited Loading

praseodym commented Apr 18, 2018

jsvd commented Apr 18, 2018 • edited Loading

andrewvc commented Apr 19, 2018

praseodym commented Apr 19, 2018

IrlJidel commented Apr 19, 2018

IrlJidel commented Apr 19, 2018

original-brownbear commented Apr 20, 2018

andrewvc commented Apr 20, 2018

IrlJidel commented Apr 23, 2018 • edited Loading

IrlJidel commented Apr 23, 2018

jsvd commented Apr 23, 2018

IrlJidel commented Apr 23, 2018 • edited Loading

IrlJidel commented Apr 23, 2018

jsvd commented Apr 23, 2018

jsvd commented Apr 23, 2018

marioplumbarius commented Apr 10, 2018 •

edited

Loading

marioplumbarius commented Apr 10, 2018 via email •

edited

Loading

marioplumbarius commented Apr 10, 2018 •

edited

Loading

andrewvc commented Apr 18, 2018 •

edited

Loading

marioplumbarius commented Apr 18, 2018 •

edited

Loading

jsvd commented Apr 18, 2018 •

edited

Loading

IrlJidel commented Apr 23, 2018 •

edited

Loading

IrlJidel commented Apr 23, 2018 •

edited

Loading