-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ruby] Increase in memory footprint in 3.15 #8337
Comments
Thanks for the report and the self-contained repro. This is a result of #8184, which should significantly increase performance, but also changes the memory management model. I want to do more investigation to get a more detailed root cause. But for now I'll share an overview of the new memory management model, in case it gives you the info you need to fix the problem on your end.
To get a root cause for your issue, we should figure out why the arenas are growing so big. Messages end up in the same arena in a couple of ways:
You can't "break up" an existing arena, but you can use I am hoping that a few well-placed |
For your repro, is it possible to reduce it further so that it is just protobuf operations? |
Scratch everything I said before. It is all true, but I just noticed a detail that I think explains all of this better. I see that you're using Ruby 2.5. Due to a shortcoming in the Ruby Is it possible for you to upgrade Ruby to 2.7? I believe this will fix the problem you are seeing. |
I will also investigate if there is a better way of allowing this collection to happen for Ruby <2.7. |
Thanks for the quick response and providing the above explanations. That is welcomed context to have about the changes in behaviour. So regarding the ruby version, I did the same profiling using ruby 2.7.2 with much better results.
And with 3.15
Unfortunately (for me) this is still a bit of a problem for applications on ruby <2.7 |
Pre-2.7 you can implement a sketchy WeakMap using https://ruby-doc.org/core-2.5.3/ObjectSpace.html#method-c-_id2ref |
@byroot thanks for the lead. What is the behavior of |
>> id = Object.new.object_id
=> 70330182748900
>> 4.times { GC.start }
=> 4
>> ObjectSpace._id2ref(id)
RangeError (0x00003ff7058f4ae4 is recycled object) However I suspect the address might be re-used at some point, so it might weird to weird stuff. I'll have to consult with some people who knows the GC better than me. Another possibility is to use a mutable proxy object as identifier, as to comply with pre 2.7 WeakMap, but I'll have to dig in the source a bit more, I've only followed the discussion, so not sure what it would be possible to do. |
One other thing to consider IMHO, is to just not do Arenas in pre 2.7. |
Ah I finally found the code I saw doing this sketchy object_id based weakref: https://github.com/ruby-concurrency/ref/blob/7a3991fea598edfe41d4e3a9e37b344bc7b7570d/lib/ref/weak_reference/pure_ruby.rb |
Yes I was worried about that.
Having two different memory management models co-existing in the code would dramatically complicate the C extension. PR #8184 (which changed from the old model to the new model) had a line count of +14,071 −23,328. If we left the old memory management model in, it would be more like +14,071 −0.
Ah looks like it is depending on finalizers to remove collected objects from the weak map? |
I think I'll try making the cache a regular |
That's understandable. Again I've only glanced at the code so I'm a bit clueless at what it does, so until I get the chance to dig into it don't take my suggestions as more than stabs in the dark 😉
Yes, just like Ruby's So the only solution I can see is to find a light mutable object to serve as WeakMap key. |
How would that work? My |
I guess I could have a secondary non-weak So the two options I see are:
Maybe (1) is our best option here. |
That's what I was suggesting yes.
You can purge it from time to time by checking if the |
Scratch that part, I missed that this WeakMap implementation had a backreference check to avoid false-positives. So it could actually be an option to use it as a fallback for ruby < 2.7. |
Sounds like a pretty good approach, thanks for the ideas! I'll work on banging this out (my option (1) from above).
It's one entry for every live protobuf message object in the process. But only for messages that users have accessed directly from Ruby (the Ruby wrappers are created lazily). For example, if you parse a protobuf binary payload that has 1,000 sub-messages, only 1 Ruby object is created for the top-level message, and only 1 object is in the cache. If the user then iterates over all 1,000 sub-message objects, now there are 1,000 Ruby objects and 1,000 objects in the cache. |
Does this make it immune to |
It works! #8341 Leaks should be fixed and performance is better too. |
I suspect this frees lots of GC time. |
The fix is released in 3.15.3. @robertlaurin please let me know if you still see the problem after upgrading. |
@haberman We're seeing some reproducible OOMs still in 3.15.3. I'll try to create a simple repro but the use case that triggers it right now is assigning a large number (~120k) of messages to a repeated field in a single parent message. |
@hatstand Are you overwriting the same element of the repeated field over and over, and expecting 120k-1 messages to be collected? Or is your repeated field 120k messages long? |
Our repeated field has 120k messages. Rough pseudocode: Outer.new(
inners: fetch_lots_of_things.map { |t| Inner.new(foo: t.field) }
) |
I see, so this isn't a GC issue, as everything is live and nothing is collectible. You are observing that this code consumes noticeably more memory than in 3.14? How much more? |
The old code would max out at about 500MiB whereas with 3.15.3 it's OOMing at 6GiB. |
Ok a >10x memory increase for this pattern definitely seems like a problem. I'll await your repro. |
This seems to be sufficient: require 'google/protobuf'
Google::Protobuf::DescriptorPool.generated_pool.build do
add_file("inner.proto", :syntax => :proto3) do
add_message "Inner" do
optional :foo, :string, 1
optional :bar, :string, 2
end
end
end
Inner = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("Inner").msgclass
Google::Protobuf::DescriptorPool.generated_pool.build do
add_file("outer.proto", :syntax => :proto3) do
add_message "Outer" do
repeated :inners, :message, 1, "Inner"
end
end
end
Outer = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("Outer").msgclass
outer_proto = Outer.new(
inners: (1..120_000).map { |i| ::Inner.new(foo: i.to_s, bar: i.to_s) }
) |
I should have mentioned that this is ruby 2.7.1 on Linux too. |
GitHub still uses legacy versions of the protobuf gem. This makes it challenging to upgrade to newer versions of the OTLP Exporter which only supports versions `~> 3.19`. This change loosens the restrictions to allow GitHub to adopt newer versions of the protobuf definitions using an older verson of the library. This change explicitly skips 3.15 due to bugs like protocolbuffers/protobuf#8337
GitHub still uses legacy versions of the protobuf gem. This makes it challenging to upgrade to newer versions of the OTLP Exporter which only supports versions `~> 3.19`. This change loosens the restrictions to allow GitHub to adopt newer versions of the protobuf definitions using an older verson of the library. This change explicitly skips 3.15 due to bugs like protocolbuffers/protobuf#8337
* chore(deps): Allow google-protobuf `~> 3.14` GitHub still uses legacy versions of the protobuf gem. This makes it challenging to upgrade to newer versions of the OTLP Exporter which only supports versions `~> 3.19`. This change loosens the restrictions to allow GitHub to adopt newer versions of the protobuf definitions using an older verson of the library. This change explicitly skips 3.15 due to bugs like protocolbuffers/protobuf#8337 * squash: PR feedback from @simi Addresses #1500 (comment)
What version of protobuf and what language are you using?
Version: v3.15.1
Language: Ruby
What operating system (Linux, Windows, ...) and version?
macOS Big Sur Version 11.2.1
What runtime / compiler are you using (e.g., python version or gcc version)
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin19]
What did you do?
Used a minimal script with opentelemetry-ruby to recreate what was observed in production, this script does require a collector running to demonstrate the issue.
What did you expect to see
Using google-protobuf 3.14
What did you see instead?
Using google-protobuf 3.15.1
Anything else we should know about your project / environment
I have had multiple reports of application owners having to revert the google-protobuf gem from 3.15 to 3.14 after seeing their applications double their memory usage, and hitting their limits.
The text was updated successfully, but these errors were encountered: