Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix quadratic performance in FileTask#out_of_date? #224

Merged
merged 1 commit into from
Oct 18, 2017
Merged

Conversation

doudou
Copy link
Contributor

@doudou doudou commented Sep 19, 2017

FileTask#out_of_date? has been changed in 462e403 to call #needed?
which calls #out_of_date? recursively. In some cases, where the
graph of dependencies is fairly dense, this leads to quadratic
performance when it was linear before.

Use #all_prerequisite_tasks to avoid the problem. This also saves
a File.exist? test as #timestamp already takes it into account.

@hsbt
Copy link
Member

hsbt commented Sep 21, 2017

Thanks. Can you show the benchmark results?

@doudou
Copy link
Contributor Author

doudou commented Sep 21, 2017

Thanks. Can you show the benchmark results?

I don't have them, it was reported by one of my coworker.

On our system, it went from our rake-based library doing 100% CPU for minutes to 2-3% CPU (the rest is taken by subcommands). It's so dramatic that I didn't see the need to isolate a benchmark.

If having a benchmark is a prerequisite, I'll try to take the time to do it.

@aycabta
Copy link
Member

aycabta commented Oct 4, 2017

I'm original writer of the #out_of_date? logic of now at #183. I understand this Pull Request. I think that this is smart arrangement and appropriate for the logic, I gues that it's enough reason to merge this.

However I agreed that benchmark is important for this. In case you didn't know yet, RDoc uses NOBENCHMARK env to enable benchmark test because CI server sometimes has a bumpy ride performance in the short term, if you want to add benchmark test.

@doudou
Copy link
Contributor Author

doudou commented Oct 5, 2017

Fair enough ... how should I provide this benchmark ? Rake doesn't seem to have a benchmark running facility as @aycabta mentions for RDoc. Just in this PR ?

FileTask#out_of_date? has been changed in 462e403 to call #needed?
which calls #out_of_date? recursively. In some cases, where the
graph of dependencies is fairly dense, this leads to quadratic
performance when it was linear before.

Use #all_prerequisite_tasks to avoid the problem. This also saves
a File.exist? test as #timestamp already takes it into account.fix quadratic performance in FileTask#out_of_date?

FileTask#out_of_date? has been changed in 462e403 to call #needed?
which calls #out_of_date? recursively. In some cases, where the
graph of dependencies is fairly dense, this leads to quadratic
performance when it was linear before.

Use #all_prerequisite_tasks to avoid the problem. This also saves
a File.exist? test as #timestamp already takes it into account.

I made a benchmark that measures the difference in duration for
out_of_date? on 12.0, 12.1 and 12.1 with this commit applied.

The benchmark used to validate the performance creates 5 layers
of FileTask, where all tasks of the parent layer are connected
to all tasks of the child. A root task is added at the top
and #out_of_date? is called on it.

The benchmark varies the number of tasks per layer.

**12.0**

| Tasks per layers | Duration (s) | Standard deviation |
|------------------|--------------|--------------------|
| 1                | 1.32e-05     | 0.96e-05           |
| 2                | 8.69e-05     | 1.11e-06           |
| 5                | 1.84e-05     | 2.43e-06           |
| 8                | 2.89e-05     | 1.05e-05           |
| 10               | 3.35e-05     | 4.12e-06           |
| 15               | 4.97e-05     | 6.74e-06           |
| 20               | 6.19e-05     | 6.23e-06           |

**12.1**

| Tasks per layers | Duration (s) | Standard deviation |
|------------------|--------------|--------------------|
| 1                | 7.00e-05     | 5.62e-05           |
| 2                | 3.98e-04     | 7.38e-05           |
| 5                | 2.32e-02     | 1.02e-03           |
| 8                | 0.22         | 0.006              |
| 10               | 0.65         | 0.006              |
| 15               | 4.78         | 0.048              |
| 20               | 20           | 0.49               |

**PR 224**

| Tasks per layers | Duration (s) | Standard deviation |
|------------------|--------------|--------------------|
| 1                | 4.47e-05     | 2.68e-05           |
| 2                | 7.56e-05     | 2.92e-05           |
| 5                | 2.42e-03     | 4.16e-05           |
| 8                | 0.51e-03     | 7.21e-05           |
| 10               | 0.77e-03     | 0.13e-03           |
| 15               | 14.2e-03     | 0.11e-03           |
| 20               | 24.2e-03     | 0.16e-03           |

Benchmarking code:

~~~ ruby
require 'rake'

LAYER_MAX_SIZE = 20
LAYER_COUNT = 5

def measure(size)
    app = Rake::Application.new
    layers = (0...LAYER_COUNT).map do |layer_i|
        (0...size).map do |i|
            app.define_task(Rake::FileTask, "#{layer_i}_#{i}")
        end
    end
    layers.each_cons(2) do |parent, child|
        child_names = child.map(&:name)
        parent.each { |t| t.enhance(child_names) }
    end
    root = app.define_task(Rake::FileTask, "root")
    root.enhance(layers[0].map(&:name))

    tic = Time.now
    root.send(:out_of_date?, tic)
    Time.now - tic
end

FileUtils.touch "root"
sleep 0.1
LAYER_COUNT.times do |layer_i|
    LAYER_MAX_SIZE.times do |i|
        FileUtils.touch "#{LAYER_COUNT - layer_i - 1}_#{i}"
    end
    sleep 0.1
end

COUNT = 100
[1, 2, 5, 8, 10, 15, 20].each do |size|
    mean = 0
    sum_squared_deviations = 0

    COUNT.times do |count|
        duration = measure(size)
        old_mean = mean
        mean     = old_mean + (duration - old_mean) / (count + 1)
        sum_squared_deviations = sum_squared_deviations + (duration - old_mean) * (duration - mean)
    end

    puts "#{size} #{mean} #{Math.sqrt(sum_squared_deviations / (COUNT - 1))}"
end
~~~
@doudou
Copy link
Contributor Author

doudou commented Oct 5, 2017

I've added the benchmark code and the results to the commit message

@aycabta
Copy link
Member

aycabta commented Oct 5, 2017

OK, RDoc uses the benchmark test at test/test_rdoc_context.rb and you can see documents for the assersions of MiniTest.

On the other hand, if you want to just take benchmark result as distinct from test, I think that benchmark-ips is best choice for it. It's standard for presentation of Rails performance with patch.

@aycabta
Copy link
Member

aycabta commented Oct 5, 2017

Oh, I was writing my comment but later than you.

@doudou
Copy link
Contributor Author

doudou commented Oct 17, 2017

Ping

@hsbt hsbt merged commit f04586d into ruby:master Oct 18, 2017
@hsbt
Copy link
Member

hsbt commented Oct 18, 2017

I heard details of this issue from @aycabta on Asakusa.rb meetup. I'm ok to merge this.

This was referenced Mar 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants