Skip to content

Commit

Permalink
Optimize BaseParser#unnormalize method to replace "\r\n" with "\n" on…
Browse files Browse the repository at this point in the history
…ly when "\r\n" is included

## Why?

See: ruby#158 (comment)

## Benchmark

```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.3/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     17.674      17.567        32.759       32.316 i/s -     100.000 times in 5.657973s 5.692371s 3.052595s 3.094448s
                 sax     25.261      25.377        48.889       49.911 i/s -     100.000 times in 3.958626s 3.940640s 2.045460s 2.003575s
                pull     28.968      29.121        61.584       61.774 i/s -     100.000 times in 3.452132s 3.433967s 1.623789s 1.618809s
              stream     28.395      28.803        55.289       57.970 i/s -     100.000 times in 3.521761s 3.471812s 1.808673s 1.725029s

Comparison:
                              dom
        before(YJIT):        32.8 i/s
         after(YJIT):        32.3 i/s - 1.01x  slower
              before:        17.7 i/s - 1.85x  slower
               after:        17.6 i/s - 1.86x  slower

                              sax
         after(YJIT):        49.9 i/s
        before(YJIT):        48.9 i/s - 1.02x  slower
               after:        25.4 i/s - 1.97x  slower
              before:        25.3 i/s - 1.98x  slower

                             pull
         after(YJIT):        61.8 i/s
        before(YJIT):        61.6 i/s - 1.00x  slower
               after:        29.1 i/s - 2.12x  slower
              before:        29.0 i/s - 2.13x  slower

                           stream
         after(YJIT):        58.0 i/s
        before(YJIT):        55.3 i/s - 1.05x  slower
               after:        28.8 i/s - 2.01x  slower
              before:        28.4 i/s - 2.04x  slower

```

- YJIT=ON : 0.98x - 1.05x faster
- YJIT=OFF : 0.98x - 1.02x faster

---------

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
naitoh and kou committed Jun 25, 2024
1 parent 78b2913 commit 97e90ce
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 2 deletions.
6 changes: 5 additions & 1 deletion lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,11 @@ def normalize( input, entities=nil, entity_filter=nil )

# Unescapes all possible entities
def unnormalize( string, entities=nil, filter=nil )
rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
if string.include?("\r")
rv = string.gsub( Private::CARRIAGE_RETURN_NEWLINE_PATTERN, "\n" )
else
rv = string.dup
end
matches = rv.scan( REFERENCE_RE )
return rv if matches.size == 0
rv.gsub!( Private::CHARACTER_REFERENCES ) {
Expand Down
6 changes: 5 additions & 1 deletion test/test_pullparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,20 +63,24 @@ def test_entity_replacement
end

def test_character_references
source = '<a>&#65;</a><b>&#x42;</b>'
source = "<a>&#65;</a>\r\n<b>&#x42;</b>"
parser = REXML::Parsers::PullParser.new( source )
element_name = ''
while parser.has_next?
event = parser.pull
case event.event_type
when :start_element
element_name = event[0]
when :end_element
element_name = nil
when :text
case element_name
when 'a'
assert_equal('A', event[1])
when 'b'
assert_equal('B', event[1])
else
assert_equal("\n", event[1])
end
end
end
Expand Down

0 comments on commit 97e90ce

Please sign in to comment.