-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParseException could not get message when xml with invalid characters #29
Comments
Could you show a Ruby script and XML that reproduce this problem? |
My XML file contains invalid encoding, part of XML file is:
my rub script is simple:
the xml file is utf-8 encoding, I know the xml contains invalid characters, after I load the xml file, ruby raise <REXML::ParseException: #<ArgumentError: invalid byte sequence in UTF-8> excepiton and I cann't get the exact error info by exception message, if I temporary change the ParseException to_s method line 32 to utf-8 like this: |
Here's a very simple reproduction of this bug (the base64 stuff is just there to make sure the special characters in the string come through): require 'rexml/document'
require 'base64'
include REXML
begin
REXML::Document.new(Base64.decode64("YT08YSDigIs+4oCL\n"))
# Equivalent to:
# REXML::Document.new "a=<a >"
rescue => e
e.to_s
end The input is invalid XML and rightly triggers a It looks like this is a bug in the err << @source.buffer[0..80].force_encoding("ASCII-8BIT").gsub(/\n/, ' ') |
…etrieved if the error content contained Unicode characters. ## Why? If the xml tag contains Unicode characters when the error occurs, an `Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT` exception is raised, ParseException error message cannot be retrieved. See: ruby#29
…alid encoding XML (#123) ## Why? If the XML tag contains Unicode characters and an error is occurred for the tag, an incompatible encoding error is raised. Because our parse exception message parts have an UTF-8 part (that includes the target tag information) and an ASCII-8BIT part (that includes error context input). Fix GH-29 Reported by DuKewu. Thanks!!!
I get the following backtrace message when i load xml:
the xml encoding is UTF-8 and with invalid characters, but parseexception to_s use ASCII-8BIT encoding, so here to_s will raise an exception with encoding fail, user will not get the actual error information in xml
The text was updated successfully, but these errors were encountered: