-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: fix memory leak with large message #996
nsqd: fix memory leak with large message #996
Conversation
This was done intentionally. The tradeoff here is that we can vastly reduce allocations by maintaining the buffer at the connection level. Although your analysis is correct, in the common case, we don't expect wildly different message sizes to be consumed on a single connection (channel). |
I do not think we can control this as messages are application specific and every message is fan outed to every channel. Neither nsqd server side nor go-nsq/pynsq client side has some command line to control the message load balance between different connections. |
should work to limit message size and thus the bytes.Buffer size. (Though, as you say, there will be one per connection.) |
We could possibly try using a per-connection |
@jehiah's suggestion seems reasonable, but have you actually observed an issue in your use case @andyxning, or is this just theory? |
@mreiferson Yes, the memory leak has a big impact on our usage because we set |
The key point is to implement the shrink logic when large buffers are not needed. I think this can be implemented with |
Ping @mreiferson @jehiah |
Ping @mreiferson @jehiah :) |
@andyxning those graphs are helpful, can you run the suite of benchmarks before and after and post the comparison? |
@mreiferson OK, Will give the feedback ASAP. |
I use v0.3.8 internally, so the tests are based on v0.3.8 using ./bench.sh. I have no available AWS resources to do a distributed performance test, so below is only a single instance performance test.
Official(I.e., v0.3.8 branch):
Adjusted Official(I.e., patch based on v0.3.8 branch):
It seems that the sub performance has indeed some degrade but it is quite small. I think does not have a major performance penalty. WDYT. |
OK, I'm satisfied with these results, thanks for indulging me @andyxning — any final concerns before merging @ploxiln @jehiah? |
nsqd/protocol_v2.go
Outdated
@@ -318,6 +316,7 @@ func (p *protocolV2) messagePump(client *clientV2, startedChan chan bool) { | |||
|
|||
subChannel.StartInFlightTimeout(msg, client.ID, msgTimeout) | |||
client.SendingMessage() | |||
var buf bytes.Buffer | |||
err = p.SendMessage(client, msg, &buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're no longer re-using the buf, you can simplify this further: declare it inside SendMessage()
, and remove the argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Done. PTAL. @ploxiln
Do we want this rebased on a more recent commit?
|
I locally rebased on master and compared ./bench.sh results, and also saw small and inconclusive performance reduction - maybe 1%, but depends more on whether my laptop had 4 or 8 seconds to cool down between runs than on which branch I'm testing. |
d490e48
to
3d05881
Compare
@mreiferson Seems there are no more additional suggestions from @ploxiln and @jehiah . Can you help to merge this? |
Looks good to me. I also re-bench-tested out of paranoia, same results. I'll merge in 12 hours or so if there are no objections. |
3d05881
to
725c653
Compare
fixed up commit msg, thanks @andyxning |
Thanks for the review and advise! Learns a lot. @mreiferson @ploxiln @jehiah |
This PR will make
bytes.Buffer
local variable for each message instead of a global one for each connection.Image Once we have a very large message, for example 20MB, after we
SendMessage
with the globalbytes.Buffer
, the underlying array will be expanded to more than 20MB to contain this large message. However, after the large message is sent, thebytes.Buffer
will not shrink to decrease memory consumption until the connection is closed andbytes.Buffer
is GCed. And, once more larger messages need to be processed for a connection, thebutes.Buffer
will be continuously enlarged.We should make
butes.Buffer
a local variable for each message. This will definitely increase the cpu usage as we have more objects which are needed to be GC. But after several iteration GC enhance for golang, the GC performance is good enough for recently golang releases(>=1.8). So, GC overhead should be ok. Actually in one of our production clusters, after the update of this change, the memory usage is decreased dramatically and the cpu usage is almost the same.(Disclaimer: our internal nsqd is compiled with golang 1.8.4) :)/cc @mreiferson @jehiah @ploxiln