Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Use java DatagramSocket directly #38

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

andrewvc
Copy link

Attempts to address elastic/logstash#9346 by just using the Java UDP API directly. The issue there is that JRuby's implementation uses too much memory, the underlying ruby strings returned from the UDP socket use up as much memory as the UDP buffer size, regardless of how big the actual packet is.

I would like to benchmark this and make one more pass before this is ready for review.

@praseodym
Copy link
Contributor

I haven’t tested or benchmarked this PR, but it might be worthwhile to use Netty instead of the Java socket API. Other plugins like logstash-input-tcp already do.

(Also, this would be so much simpler once the Logstash Java API is available!)

@andrewvc
Copy link
Author

andrewvc commented Apr 20, 2018 via email

@IrlJidel
Copy link
Contributor

You need to assign to rcvbuf

@andrewvc
Copy link
Author

andrewvc commented Apr 20, 2018 via email

@andrewvc
Copy link
Author

@IrlJidel good catch on rcvbuf, that should be fixed now. I think this needs another pass from me this week to double check java exception handling and also to add more tests. That error, for instance didn't fail the test suite.

@marioplumbarius
Copy link

Let me know when you think it's OK, so I can run perf tests using this patch and see the results before releasing it.

@logger.warn("Unable to set receive_buffer_bytes to desired size. Requested #{@receive_buffer_bytes} but obtained #{rcvbuf} bytes.")
@udp.setReceiveBufferSize(@receive_buffer_bytes)

actual_receive_buffer_bytes = @udp.getReceiveBufferBytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be @udp.getReceiveBufferSize


@udp.bind(@host, @port)
@logger.info("UDP listener started", :address => "#{@host}:#{@port}", :receive_buffer_bytes => "#{rcvbuf}", :queue_size => "#{@queue_size}")
@logger.info("UDP listener started", :address => "#{@host}:#{@port}", :receive_buffer_bytes => "#{@receive_buffer_bytes}", :queue_size => "#{@queue_size}")
Copy link
Contributor

@IrlJidel IrlJidel Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep to previous behavior we should be logging using actual_receive_buffer_bytes rather than @receive_buffer_bytes

@jsvd
Copy link
Member

jsvd commented Apr 23, 2018

I'd like to attempt a fix of the issue with JRuby instead of going directly to a rewrite of the plugin. For reference, a workaround was proposed by @original-brownbear here.

That said I'm +1 on exploring a more native implementation, just not for the reason of fixing the memory usage issue.

@IrlJidel
Copy link
Contributor

jruby RubyString doesn't have a .b method

@andrewvc
Copy link
Author

andrewvc commented Apr 23, 2018 via email

@IrlJidel
Copy link
Contributor

I put the perf test results in elastic/logstash#9346 (comment) but might make more sense to have them here:

tested on LS 5.6.3.

Tests were run for 15minutes with 300 byte message.

buffer_size Patch 3.3.2
1K 52k eps 51k eps
64K 51k eps 40k eps *
  • 3.3.2 with 64K buffer_size udp->main cpu usage was 92%, for all other cases was 46%

Couldn't test using @original-brownbear suggestion to use .b as this method doesn't appear to be available in jruby

@jsvd
Copy link
Member

jsvd commented Apr 23, 2018

If you think it's not worthwhile however I'm glad to abandon it.

I seem to have explained myself badly for you arrive at this statement. I think it's worth exploring a more native
and Java based approach, the same way we've been doing for all other plugins like tcp, http, date, etc.

What I want to alert to is that it's different to review and accept a PR that rewrites the critical path of the plugin vs one that fixes an issue by changing a small part (how the data read from the buffer is handled).
The test suite for this plugin isn't great as you mentioned, which doesn't give me any confidence that a change like this won't be worse (or better) for the many use cases (the list is left as an exercise for the reader) of this plugin. The only guarantee we have is that the things that are tested work and that many many people that use the plugin as-is, and are hitting a known list of bugs seen in this repo's issues list.

To sum up, for this PR to be the solution for the fix, it needs to go to extra lengths to provide enough confidence that current use cases are not worse overall (if it improves some or even all, great!).

@andrewvc
Copy link
Author

@jsvd those are all valid concerns. I have a lot of irons in the fire and likely won't be able to address things in that detail given time concerns.

I'm going to leave this PR open, but I won't do further work on it.

@jsvd
Copy link
Member

jsvd commented Apr 23, 2018

Just for reference, I tried running this on logstash 6.2.4 and got an exception when setting the host field in the event:

[2018-04-23T17:16:36,493][ERROR][logstash.inputs.udp      ] Exception in inputworker {"exception"=>org.logstash.MissingConverterException: Missing Converter handling for full class name=java.net.Inet4Address, simple name=Inet4Address, "backtrace"=>["org.logstash.Valuefier.fallbackConvert(Valuefier.java:97)", "org.logstash.Valuefier.convert(Valuefier.java:75)", "org.logstash.Valuefier.lambda$static$3(Valuefier.java:51)", "org.logstash.Valuefier.convert(Valuefier.java:73)", "org.logstash.ext.JrubyEventExtLibrary$RubyEvent.ruby_set_field(JrubyEventExtLibrary.java:95)", "org.logstash.ext.JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.call(JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.gen)", "org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:741)", "org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)", "Users.joaoduarte.projects.logstash_plugins.logstash_minus_input_minus_udp.lib.logstash.inputs.udp.RUBY$method$push_decoded_event$0(/Users/joaoduarte/projects/logstash_plugins/logstash-input-udp/lib/logstash/inputs/udp.rb:155)", "Users.joaoduarte.projects.logstash_plugins.logstash_minus_input_minus_udp.lib.logstash.inputs.udp.RUBY$method$push_decoded_event$0$__VARARGS__(/Users/joaoduarte/projects/logstash_plugins/logstash-input-udp/lib/logstash/inputs/udp.rb)", "org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:77)", "org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:93)", "org.jruby.ir.targets.InvokeSite.invoke(InvokeSite.java:145)", 

@andrewvc
Copy link
Author

We should, by the way, aim for a minimal refactor, re-initializing the string somehow to free that memory.

@jsvd
Copy link
Member

jsvd commented Apr 23, 2018

Also, for reference, I created an issue in JRuby about this: jruby/jruby#5148

@praseodym
Copy link
Contributor

We could also simply print a warning log message stating that using the default buffer size is bad for peformance. Or, if we'd want to get better out-of-the-box performance, decrease the default buffer size to 1472 and add a warning message that the buffer size should be set explicitly to the maximum message length that the user expects.

@jsvd
Copy link
Member

jsvd commented May 2, 2018

I created a PR that works around the memory issue #39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants