-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ruby] Memory leak from decoding payloads in version 3.15 and later #9467
Comments
Hi, thanks for the detailed report. The clear numbers and simple repro will make it easy to investigate further. Your analysis clearly shows memory differences in 3.15.0 on, and these warrant further investigation to see how we can mitigate them. But as a matter of terminology, this is not a memory leak: if it truly was a leak, the memory usage would grow even with manual GC calls. I say this partly to clarify what the proper remedy will be. If this was a true leak, we would need to be searching for places where we are somehow allocating memory without freeing it, or holding onto references improperly from some global object. But this is not the case: your tests show that manual GC calls cause us to reach a clear steady state. This means that all memory is accounted for and will be freed eventually, the issue is just that is it not freed soon enough. Going back to 3.14.x is not an option. Far too many fixes have happened since then. We need to find a way to push forward. |
As a random thought, some GC systems a manual GC call pushes harder on weak references than an automatic one. In effect some GCs have concepts like partial collection vs full collection and the explicit calls trigger a full collection. |
Yes, agreed this is not a memory leak per se. A leak-like behavior, if you will, but I couldn't think of a better way to describe it in the title. Feel free to update it as you see fit. |
It turns out that Ruby does have a mechanism by which extensions can report how much memory a given object is taking!
If we implement this function on our |
We have what looks like a very effective fix for this in #9586 |
This should be fixed by #9586. |
What version of protobuf and what language are you using?
Version: all versions 3.15.0 and up
Language: Ruby
What operating system (Linux, Windows, ...) and version?
reproduced on Linux (Ubuntu 20.04.3) and Mac (macOS 11.6.3 intel)
What runtime / compiler are you using (e.g., python version or gcc version)
reproduced with Ruby 2.7.3, 3.0.3, 3.1.0
What did you do?
Steps to reproduce the behavior:
Run the test program below using different gem versions. The program runs a loop decoding a large payload (1.8MB) and periodically reports memory stat by calling
ps
on itself. Try with versions 3.14.0, 3.15.8, 3.16.0, 3.17.3, 3.18.2, and 3.19.4. 3.14.0 works well, while all later versions show memory growing continuously until the process gets killed. Increase the reps as needed (default 5000) to see the memory exhaustion on your machine.Output showing stable memory usage with google-protobuf 3.14.0:
Output with google-protobuf 3.19.4. The same pattern is seen with 3.15.8, 3.16.0, 3.17.3, and 3.18.2 versions.
What did you expect to see
Memory usage should remain stable.
What did you see instead?
Memory usage kept climbing until the process was oom-killed.
This seems to be the result of the upb arena allocation change in #8184 and Ruby being unaware of the memory growth outside the Ruby heap. When the test program is run with forced GC the memory usage drops back to a stable level, as shown below, albeit at a high level than with 3.14.0. Major or minor GC doesn't seem to make much difference.
My suspicion is that the decode calls leave many garbage-collectible Ruby objects that don't take up much room in the Ruby heap while referencing to a large amount of upb memory, and Ruby sees no need to run GC. The forced GC calls worked in this test, but in my actual application it was only good for slowing down the memory growth, and even at that level the growth was much too fast whereas with 3.14.0 the memory usage remained nearly flat.
There was another recently report issue that sounded similar. It was closed due to lack of action after a suggestion to run GC, but I feel forcing GC is not an acceptable solution It doesn't contain the issue well enough as I observed in my application, and besides it should not be necessary to force or tune GC just to use the protobuf gem.
I can see a few different ways to resolve this:
Option 3 is easy to attain, and it would be great to make it happen while the proper fix (option 1) is investigated.
The text was updated successfully, but these errors were encountered: