UniqueGenerator.clear is very slow #1186

kevinburkeomg · 2018-04-17T20:27:33Z

Faker::UniqueGenerator.clear calls ObjectSpace.each_object, which takes about 70ms per invocation, even if there are no objects. If you are calling Faker::UniqueGenerator.clear after every test, the 70ms penalty adds up pretty quickly.

Perhaps we could store an object_created boolean, and only call ObjectSpace.each_object if that boolean indicates an object has been created.

For reference, here's the current contents of unique_generator.rb.

module Faker
  class UniqueGenerator
    def initialize(generator, max_retries)
      @generator = generator
      @max_retries = max_retries
      @previous_results = Hash.new { |hash, key| hash[key] = Set.new }
    end

    def method_missing(name, *arguments)
      @max_retries.times do
        result = @generator.public_send(name, *arguments)

        next if @previous_results[[name, arguments]].include?(result)

        @previous_results[[name, arguments]] << result
        return result
      end

      raise RetryLimitExceeded
    end

    RetryLimitExceeded = Class.new(StandardError)

    def clear
      @previous_results.clear
    end

    def self.clear
      ObjectSpace.each_object(self, &:clear)
    end
  end
end

The text was updated successfully, but these errors were encountered:

knittingdev · 2018-04-28T07:10:39Z

Hiya @kevinburkeomg 👋

This problem really intrigued me. I liked trying to dive into performance problems, and this seemed like a great way to really understand how Faker works.

I've tried a few different methods to try and test possible performance enhancements (including one that I think was similar to what you recommended):

[NO CLEAR EFFECT] Test 1 - Short Circuiting Clear if @previous_results is empty

    def clear
      return @previous_results if @previous_results.empty?
      @previous_results.clear
    end

    def self.clear
      ObjectSpace.each_object(self, &:clear)
    end

[NO CLEAR EFFECT] Test 2 - Skip Calling Clear if @previous_results.empty?
This was obviously a similar test to Test 1, but worth a shot to see if there was any change in performance, and also to be more clear about how many objects were actually cleared

    def clear
      return @previous_results if @previous_results.empty?
      @previous_results.clear
    end

    def self.clear
      cleared_obj = 0
      ObjectSpace.each_object(self) do |uniq_gen|
        next if uniq_gen.instance_variable_get(:@previous_results).empty?
        uniq_gen.clear
        cleared_obj += 1
      end
      cleared_obj
    end

[NO CLEAR EFFECT & TEST FAILURE] Test 3 - Change Hash Init and Clearing
This test was the brainchild of my benchmarking when I looked at various ways to clear the previous_results hash.

Benchmark.bm do |x|
  x.report  {  nums = 1000.times.map  {  |n| [n.to_s, n]  }.to_h; nums.clear } 
  x.report  {  nums = 1000.times.map  {  |n| [n.to_s, n]  }.to_h; nums = {} }  
  x.report  {  nums = 1000.times.map  {  |n| [n.to_s, n]  }.to_h; nums = Hash.new }  
  x.report  {  nums = 1000.times.map  {  |n| [n.to_s, n]  }.to_h; nums = Hash.new(Set.new) }  
  x.report  {  nums = 1000.times.map  {  |n| [n.to_s, n]  }.to_h; nums = Hash.new { |hash, key| hash[key] = Set.new } }  
end

Come to find out, changing this segment of code (clear and init of previous_results) appears to cause some test failures. I abandoned this after realizing it wasn't going to provide much benefit.

Sadly I can't seem to get around the current performance need.

Is there a reason you need to run Faker::UniqueGenerator.clear after each test? Can you do it after a specific batch of tests (much like before(:all) in a group when testing in rspec)? This seems like it might save you some time on your test suite as there should be enough Faker data to accommodate a few tests before needing to clear.

Hopefully this was helpful. I'd love to hear any other ideas you have on how this might be accomplished.

Thanks~
Mary

kevinburkeomg · 2018-05-03T15:10:35Z

The intent was to clean up after any test that had a unique block; I ended up fixing the problem as you suggested by clearing in a before hook in some cases, and by removing Faker in a few others.

It's not obvious that it's going to be very slow to clear a few objects from a few hashes.

kevinburkeomg · 2018-05-03T15:11:14Z

Perhaps Faker could store a list of pointers to every object that it's seen that had unique called on it, and then when you call clear it could traverse that list, instead of calling ObjectSpace.

knittingdev · 2018-05-05T00:50:04Z

I think that's probably a much better course of action. I had done a lot of experimenting on trying to filter the ObjectSpace to just handle the Faker Objects. Using a pattern where it can best identify all the objects it's created might be the best. I'll try to spend an hour on this tonight to see if I can see any measurable difference. Thanks for the idea!

MarcPer · 2018-05-24T06:30:04Z

Hi, @knittingdev and @kevinburkeomg . Sorry to barge in, but I was interested in speeding up Faker::UniqueGenerator.clear as well, and I ended up implementing on #1246 what was suggested here: keeping a set with the generators that have used unique and then only calling clear on those.

Below are some benchmarks (in all cases, I've labeled the current method using ObjectSpace as old_clear and the new method with the Set as just clear):

First case is calling Faker::Names.unique.name once every iteration and then using Faker::UniqueGenerator.clear (or Faker::UniqueGenerator.old_clear):

require "benchmark"
require "faker"

def create_clear(num_times, num_unique = 1)
  num_times.times.each do
    num_unique.times { Faker::Name.unique.name }
    Faker::UniqueGenerator.clear
  end
end

def create_old_clear(num_times, num_unique = 1)
  num_times.times.each do
    num_unique.times { Faker::Name.unique.name }
    Faker::UniqueGenerator.old_clear
  end
end

Benchmark.bm do |x|
  x.report { create_clear(10000, 1) }
  x.report { create_old_clear(10000, 1) }
end
 
# user     system      total        real
# 1.306549   0.021276   1.327825 (  1.328652)
# 31.035302   0.555962  31.591264 ( 31.602384)

Second case is calling clear without any unique method call on the generator:

Benchmark.bm do |x|
  x.report { create_clear(10000, 0) }
  x.report { create_old_clear(10000, 0) }
end

# user     system      total        real
# 0.007437   0.000000   0.007437 (  0.007428)
# 27.514668   0.024085  27.538753 ( 27.545627)

In the third case, I created multiple classes and called unique on each one, therefore adding multiple objects to the UniqueGenerator set:

require "benchmark"
require "faker"

def create_clear(num_times, num_unique = 1)
  num_times.times.each do
    num_unique.times { Class.new(Faker::Base).unique.name }
    Faker::UniqueGenerator.clear
  end
end

def create_old_clear(num_times, num_unique = 1)
  num_times.times.each do
    num_unique.times { Class.new(Faker::Base).unique.name }
    Faker::UniqueGenerator.old_clear
  end
end

Benchmark.bm do |x|
  x.report { create_clear(1000, 20) }
  x.report { create_old_clear(1000, 20) }
end

# user     system      total        real
# 1.405916   0.000000   1.405916 (  1.407707)
# 9.118875   0.004000   9.122875 (  9.124552)

Please let me know if any more tests could be useful.

vbrazo · 2018-05-26T22:00:44Z

Hmm interested and following the issue.

MarcPer · 2018-06-16T18:50:15Z

@knittingdev, @kevinburkeomg, @vbrazo, should the PR be merged? The change is simple enough and the performance improvement looks consistent.

vbrazo added the 💭 Discussion label May 12, 2018

MarcPer mentioned this issue May 24, 2018

Store list of generators with enabled uniqueness for faster clear #1246

Merged

vbrazo closed this as completed Jul 19, 2018

MarcPer mentioned this issue Sep 6, 2018

Fix global clear of unique values for Faker::UniqueGenerator #1355

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UniqueGenerator.clear is very slow #1186

UniqueGenerator.clear is very slow #1186

kevinburkeomg commented Apr 17, 2018 •

edited

Loading

knittingdev commented Apr 28, 2018

kevinburkeomg commented May 3, 2018

kevinburkeomg commented May 3, 2018

knittingdev commented May 5, 2018

MarcPer commented May 24, 2018 •

edited

Loading

vbrazo commented May 26, 2018

MarcPer commented Jun 16, 2018

UniqueGenerator.clear is very slow #1186

UniqueGenerator.clear is very slow #1186

Comments

kevinburkeomg commented Apr 17, 2018 • edited Loading

knittingdev commented Apr 28, 2018

kevinburkeomg commented May 3, 2018

kevinburkeomg commented May 3, 2018

knittingdev commented May 5, 2018

MarcPer commented May 24, 2018 • edited Loading

vbrazo commented May 26, 2018

MarcPer commented Jun 16, 2018

kevinburkeomg commented Apr 17, 2018 •

edited

Loading

MarcPer commented May 24, 2018 •

edited

Loading