-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use less memory while parsing initial large payloads #32
Conversation
Thank you for this contribution! I will try to get to this PR today, but it might not be until Monday. I didn't want to leave you hanging over the weekend without a response though. |
name = line.slice!(0...colon) | ||
next unless name.match?(/^\w+$/) | ||
|
||
# delete the colon followed by an optional space | ||
line.delete_prefix!(':') | ||
line.delete_prefix!(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely see how this change will greatly reduce the memory usage, particularly when the payload isn't a comment.
However, if a comment is sent, I think the next bit of logic can have its memory consumption further reduced.
Mutating the string in place seems to take more memory than calling the non-destructive delete_prefix
. Maybe something like this?
name = line.slice!(0...colon) | |
next unless name.match?(/^\w+$/) | |
# delete the colon followed by an optional space | |
line.delete_prefix!(':') | |
line.delete_prefix!(' ') | |
name = line.slice(0...colon) | |
next unless name.match?(/^\w+$/) | |
# delete the colon followed by an optional space | |
line = line.slice(colon...).delete_prefix(':').delete_prefix(" ") |
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating the string in place seems to take more memory than calling the non-destructive delete_prefix
I accept that that's a finding you've seen, but it's very counter-intuitive to me in terms of how allocating a brand new string could possibly take less memory than mutating a string in place, so I'm puzzled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we are OK with allocating a brand-new string, then I think it's safe to say that the most efficient approach would be to find out how many characters we want to exclude and then just call substring
with an offset that is past those characters, rather than making these three separate calls to compute intermediate strings. Similarly, if we want to mutate the existing string, I would favor getting the name
substring first, and then computing the offset past the colon/space and calling slice!
once, instead of calling slice!
followed by two delete_prefix!
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really interesting that mutating in-place takes more memory, i would never have guessed that. I'm ok with this change, the regex was the big culprit, the strings were more minor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still prefer to be doing a single slice!
here, rather than slice!
followed by delete_prefix!
twice. If line
contains a very large piece of data (which is the use case we're concerned about in the first place) then those two delete_prefix!
calls are each causing the whole buffer to be shifted left one character at a time. It should be simple enough to instead just add 1 to colon
(there is no real decision being made by that first delete_prefix
, we already know for sure that the character at that position is ':'
), then check if the next character (if any) is a space and increment it again if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I understand that the regex usage was a bigger problem and this is better than that, but that's no reason to still be incurring unnecessary overhead in the improved code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we could remove one of the delete_prefix!
s by simply using a ternary operator. Something along the lines of delete_prefix! space ? ': ' : ':'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm saying is that you can remove both of them. There is no need to mutate the string more than once, and I'd prefer that we not do so, given that the point of these changes is to reduce unnecessary overhead when the string is very long.
name = line.slice(0...colon)
colon += 1
colon += 1 if colon < line.length && line[colon] == ' '
line.slice!(colon...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at this point we can merge the PR and take care of any final polishing ourselves.
The 6.3.3 release of the Ruby SDK includes this fix. Thanks! |
Brief Summary
Similar to #20, the Ruby Server SDK calls Eventsource with large payloads of JSON data, large amount of initial memory usage. The culprit seems to be the use of regex to match the data.
Regex
The cause of the memory issues seems to be here
It seems that about 20x
line
's length is allocated when parsing the regex, which when given 10MB+ of data can easily top 200MB.The solution is to instead find the index of the colon itself, and then modify the line in-place, avoiding the need to match the regex on the entire input:
Duplicating the line
In the process of finding the regex memory issue, another, more minor one came up here:
When this code is executed and
@have_data
is false, then the entirevalue
string is copied into@data
. A simple fix for this is just:This way we reuse the original string which won't end up duplicating it