Use less memory while parsing initial large payloads #32

sw-square · 2022-05-25T21:05:06Z

Brief Summary

Similar to #20, the Ruby Server SDK calls Eventsource with large payloads of JSON data, large amount of initial memory usage. The culprit seems to be the use of regex to match the data.

Regex

The cause of the memory issues seems to be here

else
  case line
    when /^(\w+): ?(.*)$/
      item = process_field($1, $2)
      gen.yield item if !item.nil?
  end
end

It seems that about 20x line's length is allocated when parsing the regex, which when given 10MB+ of data can easily top 200MB.

The solution is to instead find the index of the colon itself, and then modify the line in-place, avoiding the need to match the regex on the entire input:

elsif (colon = line.index(':'))
  name = line.slice!(0...colon)
  next unless name.match?(/^\w+$/)

  # delete the colon followed by an optional space
  line.delete_prefix!(':')
  line.delete_prefix!(' ')

  item = process_field(name, line)
  gen.yield item if !item.nil?

Duplicating the line

In the process of finding the regex memory issue, another, more minor one came up here:

@data << "\n" if @have_data
@data << value
@have_data = true

When this code is executed and @have_data is false, then the entire value string is copied into @data. A simple fix for this is just:

if @have_data
	@data << "\n" << value
else
	@data = value
end
@have_data = true

This way we reuse the original string which won't end up duplicating it

keelerm84 · 2022-05-27T15:26:06Z

Thank you for this contribution!

I will try to get to this PR today, but it might not be until Monday. I didn't want to leave you hanging over the weekend without a response though.

keelerm84 · 2022-05-31T21:20:43Z

lib/ld-eventsource/impl/event_parser.rb

+              name = line.slice!(0...colon)
+              next unless name.match?(/^\w+$/)
+
+              # delete the colon followed by an optional space
+              line.delete_prefix!(':')
+              line.delete_prefix!(' ')


I can definitely see how this change will greatly reduce the memory usage, particularly when the payload isn't a comment.

However, if a comment is sent, I think the next bit of logic can have its memory consumption further reduced.

Mutating the string in place seems to take more memory than calling the non-destructive delete_prefix. Maybe something like this?

Suggested change

name = line.slice!(0...colon)

next unless name.match?(/^\w+$/)

# delete the colon followed by an optional space

line.delete_prefix!(':')

line.delete_prefix!(' ')

name = line.slice(0...colon)

next unless name.match?(/^\w+$/)

# delete the colon followed by an optional space

line = line.slice(colon...).delete_prefix(':').delete_prefix(" ")

What do you think?

Mutating the string in place seems to take more memory than calling the non-destructive delete_prefix

I accept that that's a finding you've seen, but it's very counter-intuitive to me in terms of how allocating a brand new string could possibly take less memory than mutating a string in place, so I'm puzzled.

And if we are OK with allocating a brand-new string, then I think it's safe to say that the most efficient approach would be to find out how many characters we want to exclude and then just call substring with an offset that is past those characters, rather than making these three separate calls to compute intermediate strings. Similarly, if we want to mutate the existing string, I would favor getting the name substring first, and then computing the offset past the colon/space and calling slice! once, instead of calling slice! followed by two delete_prefix! calls.

That's really interesting that mutating in-place takes more memory, i would never have guessed that. I'm ok with this change, the regex was the big culprit, the strings were more minor

I would still prefer to be doing a single slice! here, rather than slice! followed by delete_prefix! twice. If line contains a very large piece of data (which is the use case we're concerned about in the first place) then those two delete_prefix! calls are each causing the whole buffer to be shifted left one character at a time. It should be simple enough to instead just add 1 to colon (there is no real decision being made by that first delete_prefix, we already know for sure that the character at that position is ':'), then check if the next character (if any) is a space and increment it again if so.

I mean, I understand that the regex usage was a bigger problem and this is better than that, but that's no reason to still be incurring unnecessary overhead in the improved code.

In fact, we could remove one of the delete_prefix!s by simply using a ternary operator. Something along the lines of delete_prefix! space ? ': ' : ':'

What I'm saying is that you can remove both of them. There is no need to mutate the string more than once, and I'd prefer that we not do so, given that the point of these changes is to reduce unnecessary overhead when the string is very long.

name = line.slice(0...colon) colon += 1 colon += 1 if colon < line.length && line[colon] == ' ' line.slice!(colon...)

lib/ld-eventsource/impl/event_parser.rb

eli-darkly

I think at this point we can merge the PR and take care of any final polishing ourselves.

eli-darkly · 2022-06-16T00:14:48Z

The 6.3.3 release of the Ruby SDK includes this fix. Thanks!

keelerm84 reviewed May 31, 2022

View reviewed changes

eli-darkly reviewed Jun 9, 2022

View reviewed changes

lib/ld-eventsource/impl/event_parser.rb Outdated Show resolved Hide resolved

Fix regex memory usage error

8b93b58

sw-square force-pushed the main branch from f0091ec to 8b93b58 Compare June 13, 2022 18:08

eli-darkly approved these changes Jun 15, 2022

View reviewed changes

eli-darkly merged commit f82c135 into launchdarkly:main Jun 15, 2022

LaunchDarklyReleaseBot mentioned this pull request Jun 15, 2022

prepare 6.3.3 release launchdarkly/ruby-server-sdk#201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use less memory while parsing initial large payloads #32

Use less memory while parsing initial large payloads #32

sw-square commented May 25, 2022

keelerm84 commented May 27, 2022

keelerm84 May 31, 2022

eli-darkly Jun 9, 2022

eli-darkly Jun 9, 2022 •

edited

Loading

sw-square Jun 10, 2022

eli-darkly Jun 13, 2022

eli-darkly Jun 13, 2022

sw-square Jun 14, 2022

eli-darkly Jun 14, 2022

eli-darkly left a comment

eli-darkly commented Jun 16, 2022

Use less memory while parsing initial large payloads #32

Use less memory while parsing initial large payloads #32

Conversation

sw-square commented May 25, 2022

Brief Summary

Regex

Duplicating the line

keelerm84 commented May 27, 2022

keelerm84 May 31, 2022

Choose a reason for hiding this comment

eli-darkly Jun 9, 2022

Choose a reason for hiding this comment

eli-darkly Jun 9, 2022 • edited Loading

Choose a reason for hiding this comment

sw-square Jun 10, 2022

Choose a reason for hiding this comment

eli-darkly Jun 13, 2022

Choose a reason for hiding this comment

eli-darkly Jun 13, 2022

Choose a reason for hiding this comment

sw-square Jun 14, 2022

Choose a reason for hiding this comment

eli-darkly Jun 14, 2022

Choose a reason for hiding this comment

eli-darkly left a comment

Choose a reason for hiding this comment

eli-darkly commented Jun 16, 2022

eli-darkly Jun 9, 2022 •

edited

Loading