Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML::DocumentFragment.parse should be able to accept an IO object #2069

Closed
flavorjones opened this issue Aug 29, 2020 · 2 comments · Fixed by #3298
Closed

HTML::DocumentFragment.parse should be able to accept an IO object #2069

flavorjones opened this issue Aug 29, 2020 · 2 comments · Fixed by #3298
Milestone

Comments

@flavorjones
Copy link
Member

flavorjones commented Aug 29, 2020

Describe the bug

HTML4::Document.parse, HTML5::Document.parse, and HTML5::DocumentFragment.parse all accept IO objects.

But HTML4::DocumentFragment.parse raises an unobvious exception when we do this.

To Reproduce

#! /usr/bin/env ruby
puts Nokogiri::HTML4::DocumentFragment.parse(StringIO.new("<div>hello</div>"))

emits:

/home/flavorjones/code/oss/nokogiri/lib/nokogiri/html4/document_fragment.rb:40:in `match?': no implicit conversion of StringIO into String (TypeError)

          path = if /^\s*?<body/i.match?(tags)
                                         ^^^^
        from /home/flavorjones/code/oss/nokogiri/lib/nokogiri/html4/document_fragment.rb:40:in `initialize'
        from /home/flavorjones/code/oss/nokogiri/lib/nokogiri/html4/document_fragment.rb:24:in `new'
        from /home/flavorjones/code/oss/nokogiri/lib/nokogiri/html4/document_fragment.rb:24:in `parse'
        from (irb):5:in `<main>'

Expected behavior

I'd like the HTML4::DocumentFragment.parse method(s) be able to handle an IO object, even if that means we just #read the whole string into a String.

@flavorjones flavorjones changed the title HTML::DocumentFragement.parse should be able to accept an IO object HTML::DocumentFragment.parse should be able to accept an IO object Jul 2, 2024
@flavorjones flavorjones added this to the v1.18.0 milestone Jul 2, 2024
@flavorjones
Copy link
Member Author

@sharvy This might be a good first issue! There shouldn't be any C coding necessary, and it should be straightforward to write a test based on the reproduction script above.

I think the solution is basically something like what we have in the read_and_encode method in lib/nokogiri/html5.rb:

if tags.respond_to?(:read)
  tags = if encoding.nil?
    tags.read
  else
    tags.read(encoding: encoding)
  end
end

and I'd want to test both branches of that (with and without encoding specified).

@flavorjones flavorjones modified the milestones: v1.18.0, v1.17.0 Jul 15, 2024
@sharvy
Copy link
Contributor

sharvy commented Jul 16, 2024

@flavorjones Thanks. Looks like a quick win. I'm on it.

sharvy added a commit to sharvy/nokogiri that referenced this issue Aug 1, 2024
Previously an exception (TypeError: no implicit conversion of File into String) would be raised.

Fixes: sparklemotion#2069
sharvy added a commit to sharvy/nokogiri that referenced this issue Aug 1, 2024
Previously an exception (TypeError: no implicit conversion of File into String) would be raised.

Fixes: sparklemotion#2069
flavorjones added a commit that referenced this issue Aug 5, 2024
**What problem is this PR intended to solve?**

Previously an exception (TypeError: no implicit conversion of File into String) would be raised.

Fixes: #2069 

**Have you included adequate test coverage?**

Yes.

**Does this change affect the behavior of either the C or the Java
implementations?**

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants