Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ruby] Memory leak from decoding payloads in version 3.15 and later #9467

Closed
jhahm981 opened this issue Feb 3, 2022 · 6 comments
Closed
Assignees
Labels

Comments

@jhahm981
Copy link

jhahm981 commented Feb 3, 2022

What version of protobuf and what language are you using?
Version: all versions 3.15.0 and up
Language: Ruby

What operating system (Linux, Windows, ...) and version?
reproduced on Linux (Ubuntu 20.04.3) and Mac (macOS 11.6.3 intel)

What runtime / compiler are you using (e.g., python version or gcc version)
reproduced with Ruby 2.7.3, 3.0.3, 3.1.0

What did you do?
Steps to reproduce the behavior:

Run the test program below using different gem versions. The program runs a loop decoding a large payload (1.8MB) and periodically reports memory stat by calling ps on itself. Try with versions 3.14.0, 3.15.8, 3.16.0, 3.17.3, 3.18.2, and 3.19.4. 3.14.0 works well, while all later versions show memory growing continuously until the process gets killed. Increase the reps as needed (default 5000) to see the memory exhaustion on your machine.

def prompt(message, default)
  print "#{message} [#{default}] "
  answer = gets.chomp
  answer != "" ? answer : default
end

GEM_VERSION = prompt("google-protobuf gem version:", "3.19.4")
REPS = prompt("Total reps:", 5000).to_i
REPORT_EVERY = prompt("Report every N reps:", 200).to_i
gc_type = prompt("GC after #{REPORT_EVERY} reps (major/minor/none):", "none")
MAJOR_GC = gc_type == "major"
MINOR_GC = gc_type == "minor"

ARRAY_SIZE = 100000

# rss and vsz in kilobytes, with thousandth separator
def memory_stat
  rss, vsz = `ps -o rss,vsz -p #{Process.pid} | grep -v RSS`.chomp.split(/\s+/, 2)
  "rss #{rss.to_s.gsub(/\B(?=(...)*\b)/, ',')}    vsz #{vsz.to_s.gsub(/\B(?=(...)*\b)/, ',')}"
end

def gc_and_report_progress(current_rep, elapsed)
  memory_before_gc = memory_stat
  first_rep = current_rep - REPORT_EVERY + 1
  rate = (elapsed / REPORT_EVERY * 1000).round(1)
  reps_and_timing = "#{first_rep}..#{current_rep}:  #{rate} ms/rep"
  if MAJOR_GC || MINOR_GC
    GC.start(full_mark: MAJOR_GC)
    memory_after_gc = memory_stat
    puts "#{reps_and_timing}    #{memory_before_gc}    (post-GC:  #{memory_after_gc})"
  else
    puts "#{reps_and_timing}    #{memory_before_gc}"
  end
end

puts "Using Ruby #{RUBY_VERSION} #{RUBY_PLATFORM}"

require "bundler/inline"
gemfile(true) do
  source "https://rubygems.org"
  gem "google-protobuf", GEM_VERSION
end

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_message "leak.TestMessage" do
    repeated :array, :string, 1
  end
end

module Leak
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("leak.TestMessage").msgclass
end

# generate a big payload to decode
array = (1..ARRAY_SIZE).map { |n| "some_string_#{n}" }
proto = Leak::TestMessage.new(array: array)
payload = Leak::TestMessage.encode(proto)
puts "Encoded payload length: #{payload.length}"

GC.start
puts "Memory before test loop:    #{memory_stat}"

puts "Starting test loop..."
rep = 0
start_time = Time.now
overall_start_time = start_time
REPS.times do
  Leak::TestMessage.decode(payload)

  rep += 1
  if rep % REPORT_EVERY == 0
    elapsed = Time.now - start_time
    gc_and_report_progress(rep, elapsed)
    start_time = Time.now
  end
end
overall_elapsed = Time.now - overall_start_time

puts "Total test time: #{overall_elapsed.round(1)} seconds"
GC.start
puts "Memory after final GC:    #{memory_stat}"

Output showing stable memory usage with google-protobuf 3.14.0:

google-protobuf gem version: [3.19.4] 3.14.0
Total reps: [5000]
Report every N reps: [200]
GC after 200 reps (major/minor/none): [none]
Using Ruby 2.7.3 x86_64-linux
Fetching gem metadata from https://rubygems.org/.
Resolving dependencies...
Using bundler 2.3.5
Using google-protobuf 3.14.0 (x86_64-linux)
Encoded payload length: 1888895
Memory before test loop:    rss 132,340    vsz 727,684
Starting test loop...
1..200:  14.8 ms/rep    rss 145,872    vsz 748,448
201..400:  14.6 ms/rep    rss 145,872    vsz 748,448
401..600:  14.6 ms/rep    rss 145,872    vsz 748,448
601..800:  14.8 ms/rep    rss 145,872    vsz 748,448
801..1000:  14.6 ms/rep    rss 145,872    vsz 748,448
1001..1200:  14.6 ms/rep    rss 145,872    vsz 748,448
1201..1400:  14.5 ms/rep    rss 145,872    vsz 748,448
1401..1600:  14.6 ms/rep    rss 145,872    vsz 748,448
1601..1800:  14.5 ms/rep    rss 145,872    vsz 748,448
1801..2000:  14.9 ms/rep    rss 145,872    vsz 748,448
2001..2200:  14.5 ms/rep    rss 145,872    vsz 748,448
2201..2400:  14.5 ms/rep    rss 145,872    vsz 748,448
2401..2600:  14.5 ms/rep    rss 145,872    vsz 748,448
2601..2800:  14.6 ms/rep    rss 145,872    vsz 748,448
2801..3000:  14.5 ms/rep    rss 145,872    vsz 748,448
3001..3200:  14.9 ms/rep    rss 145,872    vsz 748,448
3201..3400:  14.7 ms/rep    rss 145,872    vsz 748,448
3401..3600:  15.0 ms/rep    rss 145,872    vsz 748,448
3601..3800:  14.9 ms/rep    rss 145,872    vsz 748,448
3801..4000:  14.9 ms/rep    rss 145,872    vsz 748,448
4001..4200:  15.0 ms/rep    rss 145,872    vsz 748,448
4201..4400:  15.2 ms/rep    rss 145,872    vsz 748,448
4401..4600:  15.1 ms/rep    rss 145,872    vsz 748,448
4601..4800:  14.9 ms/rep    rss 145,872    vsz 748,448
4801..5000:  15.1 ms/rep    rss 145,872    vsz 748,448
Total test time: 73.9 seconds
Memory after final GC:    rss 145,872    vsz 748,448

Output with google-protobuf 3.19.4. The same pattern is seen with 3.15.8, 3.16.0, 3.17.3, and 3.18.2 versions.

google-protobuf gem version: [3.19.4] 3.19.4
Total reps: [5000]
Report every N reps: [200]
GC after 200 reps (major/minor/none): [none]
Using Ruby 2.7.3 x86_64-linux
Fetching gem metadata from https://rubygems.org/.
Resolving dependencies...
Using bundler 2.3.5
Using google-protobuf 3.19.4 (x86_64-linux)
Encoded payload length: 1888895
Memory before test loop:    rss 139,244    vsz 736,172
Starting test loop...
1..200:  8.9 ms/rep    rss 1,457,924    vsz 2,567,064
201..400:  8.9 ms/rep    rss 2,776,620    vsz 4,409,840
401..600:  8.9 ms/rep    rss 4,101,372    vsz 6,252,832
601..800:  8.9 ms/rep    rss 5,426,388    vsz 8,095,808
801..1000:  8.9 ms/rep    rss 6,751,140    vsz 9,938,784
1001..1200:  8.9 ms/rep    rss 8,075,892    vsz 11,781,880
1201..1400:  8.9 ms/rep    rss 9,400,908    vsz 13,624,968
1401..1600:  8.8 ms/rep    rss 10,725,924    vsz 15,468,052
1601..1800:  8.8 ms/rep    rss 12,050,940    vsz 17,311,140
1801..2000:  8.9 ms/rep    rss 13,375,956    vsz 19,154,260
2001..2200:  8.9 ms/rep    rss 14,700,972    vsz 20,997,640
2201..2400:  8.8 ms/rep    rss 16,025,988    vsz 22,840,796
2401..2600:  8.7 ms/rep    rss 17,351,004    vsz 24,683,952
2601..2800:  8.7 ms/rep    rss 18,676,020    vsz 26,527,108
2801..3000:  8.7 ms/rep    rss 20,001,036    vsz 28,370,264
3001..3200:  8.7 ms/rep    rss 21,326,052    vsz 30,213,420
3201..3400:  8.7 ms/rep    rss 22,651,068    vsz 32,056,580
3401..3600:  8.7 ms/rep    rss 23,976,084    vsz 33,899,736
3601..3800:  8.8 ms/rep    rss 25,301,100    vsz 35,742,892
3801..4000:  8.9 ms/rep    rss 26,626,116    vsz 37,586,048
4001..4200:  8.9 ms/rep    rss 27,951,132    vsz 39,429,652
4201..4400:  8.9 ms/rep    rss 29,276,148    vsz 41,272,808
4401..4600:  8.9 ms/rep    rss 30,601,164    vsz 43,115,964
Killed

What did you expect to see

Memory usage should remain stable.

What did you see instead?

Memory usage kept climbing until the process was oom-killed.

This seems to be the result of the upb arena allocation change in #8184 and Ruby being unaware of the memory growth outside the Ruby heap. When the test program is run with forced GC the memory usage drops back to a stable level, as shown below, albeit at a high level than with 3.14.0. Major or minor GC doesn't seem to make much difference.

google-protobuf gem version: [3.19.4]
Total reps: [5000]
Report every N reps: [200]
GC after 200 reps (major/minor/none): [none] minor
Using Ruby 2.7.3 x86_64-linux
Fetching gem metadata from https://rubygems.org/.
Resolving dependencies...
Using bundler 2.3.5
Using google-protobuf 3.19.4 (x86_64-linux)
Encoded payload length: 1888895
Memory before test loop:    rss 142,464    vsz 742,608
Starting test loop...
1..200:  8.9 ms/rep    rss 1,461,672    vsz 2,573,804    (post-GC:  rss 972,104    vsz 1,892,152)
201..400:  7.2 ms/rep    rss 1,527,084    vsz 2,573,804    (post-GC:  rss 1,054,556    vsz 1,915,180)
401..600:  7.1 ms/rep    rss 1,576,480    vsz 2,573,800    (post-GC:  rss 1,086,696    vsz 1,892,152)
601..800:  7.4 ms/rep    rss 1,683,212    vsz 2,573,796    (post-GC:  rss 1,210,852    vsz 1,915,172)
801..1000:  8.2 ms/rep    rss 1,557,628    vsz 2,573,792    (post-GC:  rss 1,071,860    vsz 1,896,744)
1001..1200:  7.1 ms/rep    rss 1,595,896    vsz 2,573,784    (post-GC:  rss 1,112,776    vsz 1,901,348)
1201..1400:  7.1 ms/rep    rss 1,639,188    vsz 2,573,784    (post-GC:  rss 1,158,084    vsz 1,903,644)
1401..1600:  7.3 ms/rep    rss 1,683,476    vsz 2,573,780    (post-GC:  rss 1,213,716    vsz 1,919,760)
1601..1800:  8.2 ms/rep    rss 1,558,784    vsz 2,573,776    (post-GC:  rss 1,088,912    vsz 1,919,760)
1801..2000:  7.0 ms/rep    rss 1,596,844    vsz 2,573,772    (post-GC:  rss 1,126,956    vsz 1,919,756)
2001..2200:  7.1 ms/rep    rss 1,639,640    vsz 2,573,772    (post-GC:  rss 1,169,748    vsz 1,919,756)
2201..2400:  7.3 ms/rep    rss 1,683,484    vsz 2,573,764    (post-GC:  rss 1,207,020    vsz 1,910,536)
2401..2600:  8.2 ms/rep    rss 1,559,524    vsz 2,573,760    (post-GC:  rss 1,082,972    vsz 1,910,532)
2601..2800:  7.1 ms/rep    rss 1,597,240    vsz 2,573,756    (post-GC:  rss 1,127,292    vsz 1,919,736)
2801..3000:  7.1 ms/rep    rss 1,639,712    vsz 2,573,752    (post-GC:  rss 1,163,284    vsz 1,910,524)
3001..3200:  7.3 ms/rep    rss 1,683,040    vsz 2,573,748    (post-GC:  rss 1,206,600    vsz 1,910,520)
3201..3400:  8.2 ms/rep    rss 1,559,812    vsz 2,573,744    (post-GC:  rss 1,083,324    vsz 1,910,516)
3401..3600:  7.1 ms/rep    rss 1,597,328    vsz 2,573,740    (post-GC:  rss 1,120,732    vsz 1,910,512)
3601..3800:  7.1 ms/rep    rss 1,639,488    vsz 2,573,740    (post-GC:  rss 1,156,460    vsz 1,901,300)
3801..4000:  7.3 ms/rep    rss 1,682,944    vsz 2,573,732    (post-GC:  rss 1,199,888    vsz 1,901,292)
4001..4200:  8.2 ms/rep    rss 1,559,956    vsz 2,573,728    (post-GC:  rss 1,076,772    vsz 1,901,288)
4201..4400:  7.1 ms/rep    rss 1,596,848    vsz 2,573,724    (post-GC:  rss 1,113,732    vsz 1,901,284)
4401..4600:  7.1 ms/rep    rss 1,638,824    vsz 2,573,720    (post-GC:  rss 1,155,700    vsz 1,901,280)
4601..4800:  7.3 ms/rep    rss 1,681,968    vsz 2,573,716    (post-GC:  rss 1,192,176    vsz 1,892,064)
4801..5000:  8.2 ms/rep    rss 1,559,988    vsz 2,573,712    (post-GC:  rss 1,070,244    vsz 1,892,064)
Total test time: 38.2 seconds
Memory after final GC:    rss 1,070,244    vsz 1,892,064

My suspicion is that the decode calls leave many garbage-collectible Ruby objects that don't take up much room in the Ruby heap while referencing to a large amount of upb memory, and Ruby sees no need to run GC. The forced GC calls worked in this test, but in my actual application it was only good for slowing down the memory growth, and even at that level the growth was much too fast whereas with 3.14.0 the memory usage remained nearly flat.

There was another recently report issue that sounded similar. It was closed due to lack of action after a suggestion to run GC, but I feel forcing GC is not an acceptable solution It doesn't contain the issue well enough as I observed in my application, and besides it should not be necessary to force or tune GC just to use the protobuf gem.

I can see a few different ways to resolve this:

  1. Find a way to fix the leaky behavior and keep memory usage low and stable, like with 3.14.0. This is the ideal solution.
  2. Reintroduce the non-upb allocation strategy as an option. But this will understandably add maintenance burden, as @haberman explained in this comment.
  3. Commit to backporting security fixes and some key features (such as Apple silicon support) to 3.14.x. This would work well for users who value stability over improved performance. (myself included)

Option 3 is easy to attain, and it would be great to make it happen while the proper fix (option 1) is investigated.

@haberman
Copy link
Member

haberman commented Feb 3, 2022

Hi, thanks for the detailed report. The clear numbers and simple repro will make it easy to investigate further.

Your analysis clearly shows memory differences in 3.15.0 on, and these warrant further investigation to see how we can mitigate them. But as a matter of terminology, this is not a memory leak: if it truly was a leak, the memory usage would grow even with manual GC calls.

I say this partly to clarify what the proper remedy will be. If this was a true leak, we would need to be searching for places where we are somehow allocating memory without freeing it, or holding onto references improperly from some global object. But this is not the case: your tests show that manual GC calls cause us to reach a clear steady state. This means that all memory is accounted for and will be freed eventually, the issue is just that is it not freed soon enough.

Going back to 3.14.x is not an option. Far too many fixes have happened since then. We need to find a way to push forward.

@fowles
Copy link
Contributor

fowles commented Feb 3, 2022

As a random thought, some GC systems a manual GC call pushes harder on weak references than an automatic one. In effect some GCs have concepts like partial collection vs full collection and the explicit calls trigger a full collection.

@jhahm981
Copy link
Author

jhahm981 commented Feb 4, 2022

Yes, agreed this is not a memory leak per se. A leak-like behavior, if you will, but I couldn't think of a better way to describe it in the title. Feel free to update it as you see fit.

@elharo elharo added the ruby label Feb 4, 2022
@haberman haberman self-assigned this Feb 4, 2022
@haberman
Copy link
Member

It turns out that Ruby does have a mechanism by which extensions can report how much memory a given object is taking!

dsize calculates memory consumption in bytes by the struct. Its parameter is a pointer to your struct. You can pass 0 as dsize if it is hard to implement such a function. But it is still recommended to avoid 0.

If we implement this function on our Arena object, hopefully it will convince Ruby to run the GC more often.

@haberman
Copy link
Member

haberman commented Mar 8, 2022

We have what looks like a very effective fix for this in #9586

@haberman
Copy link
Member

haberman commented Mar 8, 2022

This should be fixed by #9586.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants