Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#7662 Make StringBiValue safe against mutations to the underlying RubyString #7663

Closed
wants to merge 1 commit into from

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jul 12, 2017

Fixes #7662

Since org.jruby.RubyString is mutable we simply cannot cache its String representation, as much as this seems to hurts performance wise.
In actuality there is no performance gain anyways since we're forced to .dup strings whenever we have shared Java and Ruby logic like in the UA filter:

https://github.com/logstash-plugins/logstash-filter-useragent/blob/master/lib/logstash/filters/useragent.rb#L138

event.set(@prefixed_major, ua_version.major.dup.force_encoding(Encoding::UTF_8)) if ua_version.major
  | event.set(@prefixed_minor, ua_version.minor.dup.force_encoding(Encoding::UTF_8)) if ua_version.minor
  | event.set(@prefixed_patch, ua_version.patch.dup.force_encoding(Encoding::UTF_8)) if ua_version.patch
  | event.set(@prefixed_build, ua_version.patchMinor.dup.force_encoding(Encoding::UTF_8)) if ua_version.patchMinor

With this fix, we get consistency by simply computing the Java object on demand. The theoretical performance hit (in practice it seems this actually speeds up the Apache dataset, likely as a result of the String objects now living for a shorter time and hence being collected easier) we can completely alleviate in real life use cases, by removing the .dup calls from plugins like the above.

@jsvd
Copy link
Member

jsvd commented Jul 13, 2017

Interesting, I wasn't aware mutating the ruby string wouldn't cause the java one to mutate too. I guess that, like you said, we were working around it doing extra .dups and .clones.

the code LGTM and I tested manually too, maybe @guyboertje can pitch in wrt to any impediment to have this merged?

@original-brownbear
Copy link
Member Author

@jsvd yea this is a nasty one, unlike other Ruby types, RubyString behaves very different from immutable String and more like mutable CharBuffer.
This spot seems to have a lot of optimization potential in general, but for now, this fixes the problem without much change :)

@original-brownbear
Copy link
Member Author

@guyboertje ping (just in case this was forgotten :))

@guyboertje
Copy link
Contributor

I have been looking for a mechanism to tell if the RubyString has been modified since it was set (from within the javaValue method) but alas there does not seem to be. Also, any form of comparison would probably be less performant than creating a new Java String each time its requested.

Further, bear in mind that we have advised/documented that in place mutation of any Event value is not supported and that a get, modify and re-set strategy is the way to go. This is mainly to prevent invalidating the accessor path cache - e.g. in place replacement of an inner part of the tree with a leaf where a path [a][b][c][d] exists and h = e.get("[a][b]"), then h['c'] = scalar without e.set('[a][b]', h). Plugin authors should know this - but Ruby filter authors may not.

Copy link
Contributor

@guyboertje guyboertje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

@guyboertje thanks for looking through this!

I have been looking for a mechanism to tell if the RubyString has been modified since it was set (from within the javaValue method) but alas there does not seem to be. Also, any form of comparison would probably be less performant than creating a new Java String each time its requested.

Yea sadly enough there doesn't seem to be any fast way of doing this (apart from simply offering a super fast Java API that returns the CharSequencer that underlies the Ruby string and mutating that ... but that may not be the best idea for maintainability :P).

@original-brownbear
Copy link
Member Author

(had to rebase, didn't merge clean into master even => waiting for green tests)

@guyboertje
Copy link
Contributor

@original-brownbear - Suggested on JRuby IRC, perhaps use public int strHashCode(Ruby runtime) on RubyString to check if value is changed - but it does require computation each time.

@elasticsearch-bot
Copy link

Armin Braun merged this into the following branches!

Branch Commits
5.x 36bf44c
master 1ebb0e3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutating String Fields Breaks Consistency between Java and Ruby Event APIs
4 participants