Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Node#each_recursive performance #139

Merged
merged 5 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions benchmark/each_recursive.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
loop_count: 100
contexts:
- gems:
rexml: 3.2.6
require: false
prelude: require 'rexml'
- name: master
prelude: |
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml'
- name: 3.2.6(YJIT)
gems:
rexml: 3.2.6
require: false
prelude: |
require 'rexml'
RubyVM::YJIT.enable
- name: master(YJIT)
prelude: |
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml'
RubyVM::YJIT.enable

prelude: |
require 'rexml/document'

xml_source = +"<root>"
100.times do
x_node_source = ""
100.times do
x_node_source = "<x>#{x_node_source}</x>"
end
xml_source << x_node_source
end
xml_source << "</root>"

document = REXML::Document.new(xml_source)

benchmark:
each_recursive: document.each_recursive { |_| }
12 changes: 8 additions & 4 deletions lib/rexml/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,14 @@ def parent?

# Visit all subnodes of +self+ recursively
def each_recursive(&block) # :yields: node
self.elements.each {|node|
block.call(node)
node.each_recursive(&block)
}
stack = []
each { |child| stack.unshift child if child.node_type == :element }
until stack.empty?
child = stack.pop
yield child
n = stack.size
child.each { |grandchild| stack.insert n, grandchild if grandchild.node_type == :element }
end
Comment on lines -55 to +62
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve REXML::Elements#each(nil) case instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a possible improvement. But, IMHO, it can be split into another pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to change REXML::Node#each_recursive when we improve REXML::Elements#each(nil)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is best so like that we don't need to change it when we improve something.

I don't really like implementations that are optimized when using specific arguments. From that point of view, I'm not very keen on optimizing each(nil).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If elements.each(nil) is improved, we can use elements.each { |x| yield x } instead of each { |x| yield x if x.node_type == :element }.

However, I don't think it is a problem that there are some duplications of each { |x| yield x if x.node_type == :element } sometimes. Because elements.each(&) has some overheads and the performance is more important here.

Repeatedly, I talked about the performance. The maintainability and other matters are out of topic here because the speed of each_recursive is one of the crucial parts of the XML library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maintainability and other matters are out of topic here

As a maintainer, I don't agree it.
But I can understand your opinion as a heavy each_recursive user.

OK. Let's focus on only each_recursive in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your understanding!

end

# Find (and return) first subnode (recursively) for which the block
Expand Down
36 changes: 36 additions & 0 deletions test/test_document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,42 @@ def test_gt_linear_performance
end
end

def test_each_recursive
xml_source = <<~XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root name="root">
<x name="1_1">
<x name="1_2">
<x name="1_3" />
</x>
</x>
<x name="2_1">
<x name="2_2">
<x name="2_3" />
</x>
</x>
<!-- comment -->
<![CDATA[ cdata ]]>
</root>
XML

expected_names = %w[
root
1_1 1_2 1_3
2_1 2_2 2_3
]

document = REXML::Document.new(xml_source)

# Node#each_recursive iterates elements only.
# This does not iterate XML declerations, comments, attributes, CDATA sections, etc.
actual_names = []
document.each_recursive do |element|
actual_names << element.attributes["name"]
end
assert_equal(expected_names, actual_names)
end

class WriteTest < Test::Unit::TestCase
def setup
@document = REXML::Document.new(<<-EOX)
Expand Down