New architecture proposal to reduce memory usage #8

radarek · 2021-09-29T23:03:28Z

Hello.

I noticed that unicode-emoji gem takes more memory than I expected from such a library. Just requiring the gem takes 7-8MB. I know that for today's standards it isn't a huge amount, but if many such gems were used then it could add up to unnecessary memory usage.

Here is a method I used to measure memory usage (I use get_process_mem gem):

require 'get_process_mem'

def mem(&block)
  raise ArgumentError, 'missing block' unless block

  mem = GetProcessMem.new
  before = mem.mb
  block.call
  after = mem.mb
  return after - before
end

puts mem { require "unicode/emoji" }

Running multiple times this script, gives me numbers between 7-9MB (most of the time something around 7.6MB).

I also used memory_profiler:

require 'memory_profiler'
report = MemoryProfiler.report do
  require 'unicode/emoji'
end

report.pretty_print

It gives the information where memory is allocated but also how much of it is retained (which in most cases means it will never be freed).

I'm not sure how exactly people use this gem but looking at the content I suspect that most probably they use one of provided regex constants. And this is the case in the application I work on. We literally use single regex from this library (Unicode::Emoji::REGEX).

What can be done to lower memory usage? Here is the idea:

instead of generating all the Regex constants, they could be generated offline and included directly in a file
every constant could go to different file and autoload :FOO, File.expand_path('emoji/foo', __dir__) could be used to lazy load it when it is used
INDEX could be lazy loaded too. If all regexes were generated offline then it would be only used by methods like properties. No method calls? No constant loaded.
some regexes are quite big (require "object_space"; ObjectSpace.memsize_of(...)), for example REGEX_VALID_INCLUDE_TEXT is almost 0.5MB. I didn't look closely but I think that some big unions like "||..." could be replaced by range "[char1-charN]" (if it is sequence of subsequent characters of course).

If done properly then for usage scenario like mine (single constant), memory usage would be reduce from 7-8MB to a size of that constant (in our case it is 120kB).

Do you think it is worth looking into it?

The text was updated successfully, but these errors were encountered:

janlelis · 2021-09-30T20:47:33Z

Hi @radarek,

thanks for brining this up and doing the researches.I think it would be great to optimize the index structure and memory behavior for typical use cases.

Some feedback to your thoughts:

instead of generating all the Regex constants, they could be generated offline and included directly in a file

Sounds good.

every constant could go to different file and autoload :FOO, File.expand_path('emoji/foo', dir) could be used to lazy load it when it is used

INDEX could be lazy loaded too. If all regexes were generated offline then it would be only used by methods like properties. No method calls? No constant loaded.

Is the autoload currently encouraged? I have always liked it, and if concurrency issues can be ruled out, I am all for it

some regexes are quite big (require "object_space"; ObjectSpace.memsize_of(...)), for example REGEX_VALID_INCLUDE_TEXT is almost 0.5MB. I didn't look closely but I think that some big unions like "||..." could be replaced by range "[char1-charN]" (if it is sequence of subsequent characters of course).

I haven't looked into optimizing the generated regexes, so this sounds exciting.

radarek · 2021-10-03T19:25:54Z

Hi @janlelis

thanks for brining this up and doing the researches.I think it would be great to optimize the index structure and memory behavior for typical use cases.

So foar I'm focused on lazy loaded constants and optimizing size of regexes. Index indeed could be optimized too.

Is the autoload currently encouraged? I have always liked it, and if concurrency issues can be ruled out, I am all for it

I thought that too but it looks that it is still a valid way to load ruby code:

https://bugs.ruby-lang.org/issues/921 - it is an old issue about thread unsafety of autoloading in Ruby but it was resolved more that 10 years ago
there was a plan to remove it from the language but Matz decided to not do this (at least not in a near future). See https://bugs.ruby-lang.org/issues/5653
it is used by rails itself (zeitwerk uses autoload underhood)
autoload is used in bundler https://github.com/rubygems/bundler/search?q=autoload
autoload is used in Ruby's core/stdlib

Having all the above I think it is safe to use it.

janlelis · 2021-10-04T07:38:23Z

Great, thank you for these links! Looking forward to get #9 merged.

janlelis · 2021-10-06T19:33:42Z

Released with v3.0.0!

radarek mentioned this issue Sep 30, 2021

Make all regexes constants lazy loaded from pregenerated files #9

Merged

janlelis closed this as completed Oct 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New architecture proposal to reduce memory usage #8

New architecture proposal to reduce memory usage #8

radarek commented Sep 29, 2021 •

edited

Loading

janlelis commented Sep 30, 2021

radarek commented Oct 3, 2021

janlelis commented Oct 4, 2021

janlelis commented Oct 6, 2021

New architecture proposal to reduce memory usage #8

New architecture proposal to reduce memory usage #8

Comments

radarek commented Sep 29, 2021 • edited Loading

janlelis commented Sep 30, 2021

radarek commented Oct 3, 2021

janlelis commented Oct 4, 2021

janlelis commented Oct 6, 2021

radarek commented Sep 29, 2021 •

edited

Loading