Skip to content

Commit

Permalink
Handle ASCII-8BIT encoding on fragment input
Browse files Browse the repository at this point in the history
If we don't know the encoding, we default to using UTF-8.  I think this
was correct behavior on 1.8, but now that Ruby is encoding aware, I
think we should raise an exception if someone passes a document encoded
as ASCII-8BIT.  However, to remain backwards compatible, we'll just
assume UTF-8 for now.

Fixes #553
  • Loading branch information
tenderlove committed Feb 20, 2018
1 parent a705d15 commit 7539b14
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 2 deletions.
12 changes: 11 additions & 1 deletion lib/nokogiri/html/document_fragment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,17 @@ class DocumentFragment < Nokogiri::XML::DocumentFragment
def self.parse tags, encoding = nil
doc = HTML::Document.new

encoding ||= tags.respond_to?(:encoding) ? tags.encoding.name : 'UTF-8'
encoding ||= if tags.respond_to?(:encoding)
encoding = tags.encoding
if encoding == ::Encoding::ASCII_8BIT
'UTF-8'
else
encoding.name
end
else
'UTF-8'
end

doc.encoding = encoding

new(doc, tags)
Expand Down
7 changes: 6 additions & 1 deletion test/html/test_document_fragment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ def setup
@html = Nokogiri::HTML.parse(File.read(HTML_FILE), HTML_FILE)
end

def test_ascii_8bit_encoding
s = String.new 'hello', encoding: Encoding::ASCII_8BIT
assert_equal "hello", Nokogiri::HTML::DocumentFragment.parse(s).to_html
end

def test_inspect_encoding
fragment = "<div>こんにちは!</div>".encode('EUC-JP')
f = Nokogiri::HTML::DocumentFragment.parse fragment
Expand All @@ -21,7 +26,7 @@ def test_html_parse_encoding
assert_equal 'EUC-JP', f.document.encoding
assert_equal "こんにちは!", f.content
end

def test_unlink_empty_document
frag = Nokogiri::HTML::DocumentFragment.parse('').unlink # must_not_raise
assert_nil frag.parent
Expand Down

0 comments on commit 7539b14

Please sign in to comment.