-
-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Empty HTML4 DocumentFragment serialization doesn't respect encoding #2649
Labels
Milestone
Comments
sgoedecke
added
the
state/needs-triage
Inbox for non-installation-related bug reports or help requests
label
Sep 19, 2022
Hello! Thank you for opening this issue, this is most certainly a bug and I appreciate you reporting it and diagnosing it. I'll schedule some time to fix it! |
flavorjones
added
topic/encoding
and removed
state/needs-triage
Inbox for non-installation-related bug reports or help requests
labels
Sep 19, 2022
And your diagnosis seems right on:
|
Prior issues in Ruby:
so this behavior of |
flavorjones
added a commit
that referenced
this issue
Sep 19, 2022
and improve test coverage around fragment encoding Closes #2649
See #2650 for a proposed fix. |
flavorjones
added a commit
that referenced
this issue
Sep 19, 2022
and improve test coverage around fragment encoding Closes #2649
flavorjones
added a commit
that referenced
this issue
Sep 19, 2022
and improve test coverage around fragment encoding Closes #2649
Wow, so quick! Thank you ❤️ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please describe the bug
Parsing and serializing a
HTML4::DocumentFragment
will produce aUTF-8
string by default. However, when the input string is empty, it produces aUS-ASCII
encoded string instead, regardless of the passed encoding option.Let me know if this is actually expected behaviour and something I should be working around!
Help us reproduce what you're seeing
This script produces a failing test:
Output:
Environment
Additional information
From what I can tell, the issue comes from here: https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/xml/node_set.rb#L283
When there's a single node (which there always is for proper documents), that node sets the encoding properly. But when there are no nodes, we're effectively returning
[].join
, which isUS-ASCII
encoding.The text was updated successfully, but these errors were encountered: