Skip to content

Commit

Permalink
If the size of the content parsed by StringScanner to parse huge XML …
Browse files Browse the repository at this point in the history
…exceeds a certain size, have it removed.

See: #150
  • Loading branch information
naitoh committed Jun 19, 2024
1 parent f704011 commit 70bce7e
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 0 deletions.
2 changes: 2 additions & 0 deletions lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ def peek depth=0

# Returns the next event. This is a +PullEvent+ object.
def pull
@source.drop_parsed_content

pull_event.tap do |event|
@listeners.each do |listener|
listener.receive event
Expand Down
7 changes: 7 additions & 0 deletions lib/rexml/source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class Source
attr_reader :encoding

module Private
SCANNER_RESET_SIZE = 100000
PRE_DEFINED_TERM_PATTERNS = {}
pre_defined_terms = ["'", '"', "<"]
pre_defined_terms.each do |term|
Expand Down Expand Up @@ -84,6 +85,12 @@ def buffer
@scanner.rest
end

def drop_parsed_content
if @scanner.pos > SCANNER_RESET_SIZE
@scanner.string = @scanner.rest
end
end

def buffer_encoding=(encoding)
@scanner.string.force_encoding(encoding)
end
Expand Down
23 changes: 23 additions & 0 deletions test/test_pullparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -98,5 +98,28 @@ def test_peek
end
assert_equal( 0, names.length )
end

N_ELEMENTS = 50000
N_STRING = 'a' * 50000
def build_xml(n_elements)
xml = '<?xml version="1.0"?><root>'

n_elements.times do |i|
xml << '<child >'
xml << N_STRING
xml << '</child>'
end
xml << '</root>'
end

# NOTE: this test is too slow.
def test_parse_large_xml
xml = build_xml(N_ELEMENTS)

parser = REXML::Parsers::PullParser.new(xml)
while parser.has_next?
parser.pull
end
end
end
end

0 comments on commit 70bce7e

Please sign in to comment.