Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_child seems very slow under pure java implementation #692

Closed
ggodet-bar opened this issue Jun 5, 2012 · 9 comments
Closed

add_child seems very slow under pure java implementation #692

ggodet-bar opened this issue Jun 5, 2012 · 9 comments

Comments

@ggodet-bar
Copy link

Hi,

I'm running the following script

require 'nokogiri'
require 'benchmark'

puts Nokogiri::VERSION

doc = Nokogiri::XML::Document.new
root = Nokogiri::XML::Node.new('root', doc)

Benchmark.bm { |x|
  x.report {
    10000.times do |i|
      root << Nokogiri::XML::Node.new('randomNode', doc)
    end
  }
}

Running under 1.9.3-p125 returns the following

1.5.2
       user     system      total        real
   0.030000   0.000000   0.030000 (  0.027721)

... while running on JRuby (1.6.7, 1.7.0.preview1 and head) return uniformly

1.5.2
       user     system      total        real
  21.880000   0.100000  21.980000 ( 21.259000)

When running JRuby's profiler, it turns out that 95% of these twenty-something seconds is spent in the add_child method.

This might be an unfortunate but natural side-effect of using the Xerces library, but I haven't found any explicit documentation about this.

@yokolet
Copy link
Member

yokolet commented Jun 5, 2012

Hmmm. Yes, pure Java version is really really slow. There might be something to improve performance. So, I'll have a look.

@ggodet-bar
Copy link
Author

This might help: launching the same script with the Java profiler turned on (-J-Xprof flag) returns the following

1.5.2
       user     system      total        real
  22.480000   0.200000  22.680000 ( 22.036000)

Flat profile of 22.92 secs (2210 total ticks): JRubyJIT-1

  Interpreted + native   Method                        
100.0%     1  +     0    org.jruby.compiler.impl.BaseBodyCompiler.loadNil
100.0%     1  +     0    Total interpreted

  Thread-local ticks:
100.0%  2209             Blocked (of total)


Flat profile of 22.87 secs (2205 total ticks): JRubyJIT-2

  Interpreted + native   Method                        
 33.3%     0  +     1    java.lang.System.nanoTime
 33.3%     0  +     1    java.lang.Class.getName0
 33.3%     1  +     0    org.jruby.compiler.ASTCompiler.compileNext
100.0%     1  +     2    Total interpreted

  Thread-local ticks:
 99.9%  2202             Blocked (of total)


Flat profile of 24.08 secs (2322 total ticks): main

  Interpreted + native   Method                        
  1.5%    35  +     0    org.jruby.RubyArray.toJavaArray
  1.5%    35  +     0    org.jruby.javasupport.util.RuntimeHelpers.restructureBlockArgs19
  1.3%    31  +     0    nokogiri.XmlNode.adoptAsChild
  0.6%    15  +     0    org.jruby.lexer.yacc.RubyYaccLexer.yylex
  0.6%    14  +     0    org.jruby.internal.runtime.methods.DynamicMethod.call
  0.4%    10  +     0    org.jruby.RubyBasicObject.getVariableTableForWriteAtomic
  0.2%     0  +     5    java.io.UnixFileSystem.getBooleanAttributes0
  0.2%     5  +     0    org.jruby.RubyArray.<init>
  0.2%     0  +     4    sun.misc.Unsafe.getObject
  0.2%     0  +     4    sun.misc.Unsafe.compareAndSwapLong
  0.2%     4  +     0    org.jruby.RubyArray.first
  0.2%     4  +     0    java.util.ArrayList.<init>
  0.2%     4  +     0    org.apache.xerces.dom.CoreDocumentImpl.callUserDataHandlers
  0.2%     4  +     0    org.jruby.lexer.yacc.RubyYaccLexer.getIdentifier
  0.2%     4  +     0    java.util.regex.Pattern.compile
  0.1%     0  +     3    java.lang.System.nanoTime
  0.1%     0  +     3    java.lang.ClassLoader.findLoadedClass0
  0.1%     3  +     0    java.nio.HeapByteBuffer.<init>
  0.1%     3  +     0    java.util.Arrays.copyOf
  0.1%     3  +     0    org.jruby.parser.Ruby19Parser.<clinit>
  0.1%     3  +     0    rubyjit.Nokogiri::XML::Node#add_child_6F61CFDE621E27B3F04E7A09FE6EC91CFF43E33C3809363.__file__
  0.1%     0  +     2    java.lang.Thread.currentThread
  0.1%     0  +     2    java.lang.String.intern
  0.1%     0  +     2    java.lang.Object.getClass
  0.1%     0  +     2    java.io.UnixFileSystem.canonicalize0
 13.7%   273  +    46    Total interpreted (including elided)

     Compiled + native   Method                        
 22.9%   531  +     0    org.apache.xerces.dom.CoreDocumentImpl.renameNode
 20.3%   471  +     0    org.apache.xerces.dom.NodeImpl.getUserData
 20.0%   460  +     4    org.apache.xerces.dom.CoreDocumentImpl.callUserDataHandlers
 14.5%   337  +     0    org.apache.xerces.dom.NodeImpl.lookupNamespaceURI
  5.3%   122  +     0    nokogiri.internals.NokogiriHelpers.nodeListToRubyArray
  0.6%    15  +     0    nokogiri.XmlNodeSet.relink_namespace
  0.0%     1  +     0    UncommonTrapBlob
  0.0%     0  +     1    org.jruby.lexer.yacc.RubyYaccLexer.yylex
 83.6%  1937  +     5    Total compiled

         Stub + native   Method                        
  0.5%     0  +    12    java.lang.Object.getClass
  0.4%     0  +    10    java.lang.System.arraycopy
  0.2%     0  +     5    java.lang.Object.clone
  0.2%     0  +     5    java.lang.Object.hashCode
  0.2%     0  +     4    java.lang.String.intern
  0.1%     0  +     2    java.util.zip.Inflater.inflateBytes
  0.0%     0  +     1    java.lang.Thread.currentThread
  0.0%     0  +     1    java.lang.Class.isPrimitive
  0.0%     0  +     1    java.lang.Class.getClassLoader0
  0.0%     0  +     1    sun.misc.Unsafe.getObjectVolatile
  0.0%     0  +     1    sun.misc.Unsafe.compareAndSwapInt
  1.9%     0  +    43    Total stub

  Thread-local ticks:
  0.8%    18             Class loader


Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM

  Thread-local ticks:
100.0%     1             Blocked (of total)


Global summary of 24.09 seconds:
100.0%  2334             Received ticks
  0.4%     9             Received GC ticks
 11.3%   263             Compilation
  0.0%     1             Other VM operations
  0.8%    18             Class loader

@gstark
Copy link

gstark commented Sep 6, 2012

It seems while add_child is slow, add_next_sibling to the last child of a parent is not. Here is some benchmarking code showing builder usage, direct Doc manipulation, using add_next_sibling and an approach that adds a first child followed by using add_next_sibling. I'm not yet sure why add_next_sibling to the last element of a parent is faster than calling add_child (or if there is any side effect I'm not aware of)

require 'nokogiri'
require 'digest'
require 'benchmark'

XML_ELEMENT_COUNT=500
BENCHMARK_LOOP_COUNT=500

# Plain old builder...
def builder
  xml = Nokogiri::XML::Builder.new
  xml.main do
    XML_ELEMENT_COUNT.times do
      xml.foobar({:test => 42, :thing => "other thing"})
    end
  end

  xml
end

# Adds the first element with add_child, then uses add_next_sibling
def direct_sibling
  doc = Nokogiri::XML::Document.new
  root_node = doc << doc.create_element("main")

  first_node = root_node.add_child doc.create_element("foobar", {:test => 42, :thing => "other thing"})
  (XML_ELEMENT_COUNT-1).times do
    first_node.add_next_sibling doc.create_element("foobar", {:test => 42, :thing => "other thing"})
  end

  doc
end

# Helper method to more intelligently add a child node
def add_smart(parent, new_node)
  last_element = parent.last_element_child
  if last_element
    last_element.add_next_sibling(new_node)
  else
    parent << new_node
  end
end

# Adds child nodes either as next sibling to the last element or as a child
def direct_sibling_smart
  doc = Nokogiri::XML::Document.new
  root_node = doc << doc.create_element("main")

  XML_ELEMENT_COUNT.times do
    new_node = doc.create_element("foobar", {:test => 42, :thing => "other thing"})
    add_smart(root_node, new_node)
  end

  doc
end

# 100% direct XML::Document usage
def direct
  doc = Nokogiri::XML::Document.new
  root_node = doc << doc.create_element("main")

  XML_ELEMENT_COUNT.times do
    root_node << doc.create_element("foobar", {:test => 42, :thing => "other thing"})
  end

  doc
end

Benchmark.bm(1) do |bm|
  bm.report("smart:")   { BENCHMARK_LOOP_COUNT.times { direct_sibling_smart } }
  bm.report("sibling:") { BENCHMARK_LOOP_COUNT.times { direct_sibling } }
  bm.report("direct:")  { BENCHMARK_LOOP_COUNT.times { direct } }
  bm.report("builder:") { BENCHMARK_LOOP_COUNT.times { builder } }
end

Here are some benchmarks:

JRuby 1.6.7 + Nokogiri 1.5.5

       user     system      total        real
smart:  5.776000   0.000000   5.776000 (  5.777000)
sibling:  2.568000   0.000000   2.568000 (  2.567000)
direct: 60.426000   0.000000  60.426000 ( 60.426000)
builder: 62.780000   0.000000  62.780000 ( 62.780000)

MRI 1.9.3-p194 + Nokogiri 1.5.5
        user     system      total        real
smart:  3.580000   0.010000   3.590000 (  3.592824)
sibling:  3.450000   0.000000   3.450000 (  3.448539)
direct:  3.390000   0.010000   3.400000 (  3.400534)
builder:  4.700000   0.010000   4.710000 (  4.709923)

@yokolet
Copy link
Member

yokolet commented Sep 7, 2012

Ah, I figured out. This is a negative effect of a hach to make pure Java Nokogiri work like libxml version. Apache Xerces doesn't allow to add a bare text, so we wrap the text by a temporary node. After adding a given text with the wrapper node, pure Java Nokogiri takes the wrapper node out from DOM tree. add_child does this hack while add_next_sibling doesn't. I guess this is the reason of the difference in performance.

I'll hava a look whether I can improve this logic.

@gstark
Copy link

gstark commented Sep 7, 2012

I also noticed that simply removing the call to relink_namespace makes a drastic difference in performance but introduces test failures.

I'm not sure the nokogiri_text_wrapper hack is being invoked in either of our examples.

@flavorjones
Copy link
Member

@yokolet @jvshahid This still appears to be a poorly performing method.

# ruby 2.3.0
1.6.7.2
       user     system      total        real
   0.020000   0.000000   0.020000 (  0.020321)
# jruby 9.0.4.0
1.6.7.2
       user     system      total        real
  84.110000   0.090000  84.200000 ( 83.353718)

Any thoughts?

@jvshahid
Copy link
Member

I'm not familiar with this issue, but i have the benchmark on my dev machine and will investigate it.

@jvshahid
Copy link
Member

Verified that the call to relink_namepsace is taking long time. I also attached an image of a profiling session. Removing out the relink_namespace call (as pointed out previously) improves the performance dramatically as shown below:

       user     system      total        real
   0.230000   0.000000   0.230000 (  0.190405)

snapshot-1455818721109-cpu-calltree

@jvshahid
Copy link
Member

jvshahid commented Mar 9, 2016

I finally got time to figure out what's going on. It seems like the JRuby implementation had a bug that caused it to run in O(n2). The relink_namespace method was called on the root node every time a new node is added as a child of root. I modified the code locally and it seems to have fixed the slowness issue. I will take the opportunity to clean up the relink_namespace code and document it. i also need to figure out a way to test this behavior in order to prevent regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants