Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes XML Unescaping #362

Merged
merged 1 commit into from
Aug 27, 2017
Merged

Fixes XML Unescaping #362

merged 1 commit into from
Aug 27, 2017

Conversation

johnjaylward
Copy link
Contributor

@johnjaylward johnjaylward commented Aug 19, 2017

Key Changes:

Fixes #361

  • Removes unescape from the XML class calls
  • fixes bug with unescape method
  • moves unescape logic into the XMLTokener class for more consistency

What problem does this code solve?
Bug fix for XML Escaping that was included in a2d3b59 as part of a fix supporting unicode entities (i.e. &#xA which maps to a new line character)

The original fix was causing XML string to be unescaped twice which was also causing the bug as mentioned in #361.

Risks
Low/none. This is a bug fix to correct incorrect parsing of XML entities.

Changes to the API?
One new package level method was added to XMLTokener that handles all valid XML entity unescaping.

Will this require a new release?
Yes. The bug affects processing XML documents only, however, the bug prevents valid XML files from being read.

Should the documentation be updated?
No

Does it break the unit tests?
No. All unit tests should pass. New unit tests were added in order to test the changes and ensure that entities are only unencoded once. See stleary/JSON-Java-unit-test#79

Was any code refactored in this commit?
Yes. The actual implementation for the unescapeing was moved from the XML class to the XMLTokener class.

XMLTokener was already unescaping named entities (i.e. &lt; to < and &amp; to &) however it was not converting unicode entities back to their codepoints (i.e &#34 to " or &#x34 to 4). To compensate for that I originally created the XML.unescape method and incorrectly injected it into the XML processing.

This PR corrects that by moving all the entity handling into the XMLTokener class and removing the unescape injection that I placed into the processing last time.

Review status
APPROVED Starting 3 day window for comments.

* Removes unescape from the XML class calls
* fixes bug with unescape method
* moves unescape logic into the XMLTokener class for more consistency
@stleary
Copy link
Owner

stleary commented Aug 23, 2017

Sorry for the delay, back from eclipse, approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Xml.java doesn't cope with encoded ampersand followed by semicolon
2 participants