Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrettyPrintWriter fails to serialize characters in the Unicode Supplementary Multilingual Plane in XML 1.0 mode and XML 1.1 mode #337

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -206,9 +206,7 @@ protected void writeText(final QuickWriter writer, final String text) {
}

private void writeText(final String text, final boolean isAttribute) {
final int length = text.length();
for (int i = 0; i < length; i++) {
final char c = text.charAt(i);
text.codePoints().forEach(c -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 1fcfa0b makes this (@since 9) safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what you are talking about in this review comment. The method is present in Java 8.

Copy link
Contributor

@jglick jglick May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. Was just going by https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#codePoints() which says 9. At any rate I would hope the CI build would fail if this were not permitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps

Do you have any evidence for this claim which is casting doubt on the correctness of this change and potentially making it harder for subsequent reviewers to approve? If you do not, I would suggest that you refrain from making such review comments.

https://docs.oracle.com/javase/8/docs/api/java/lang/CharSequence.html#codePoints--

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above link. It seems the @since tags are contradictory, unless the JDK team has a policy of noting when an override of a default method was added (which would seem strange to me since that should not change the API surface).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.oracle.com/javase/8/docs/api/java/lang/CharSequence.html#codePoints-- is present in Java 8 and this code compiles successfully on Java 8. As far as I can tell there is no action item here, and this whole review comment was unnecessary and served only to chew up some of my time to refute an unverified claim as well as potentially confusing future reviewers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XStream 1.5.x will target Java 11. No point any longer to use Java 8 as minimum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But XStream 1.4 still uses Java 8, and we want this critical bug fix in that line. Anyway, this change works in Java 8, so this whole thread is pointless. I have no idea why this review feedback was left in the first place.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codePoints() was added to CharSequence interface as a default method in Java 8.
In Java 9, an override of this method was added to String (which implements CharSequence).

So, it should work for both Java 8 and 9, but it can be slightly faster for Strings in Java 9+ due to optimised version added to String in Java 9.

switch (c) {
case '\0':
if (mode == XML_QUIRKS) {
Expand Down Expand Up @@ -238,7 +236,7 @@ private void writeText(final String text, final boolean isAttribute) {
case '\t':
case '\n':
if (!isAttribute) {
writer.write(c);
writer.write(Character.toChars(c));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Unnecessary in this case I think.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary in this case I think.

How would it compile without this hunk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I just meant in this case we know the character will be a single char. Not important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and I knew that when deciding to use Character.toChars(c) in this case and the case below rather than prematurely optimizing by casting the int to a char.

This review comment was unnecessary in this case I think.

break;
}
//$FALL-THROUGH$
Expand All @@ -251,7 +249,7 @@ private void writeText(final String text, final boolean isAttribute) {
+ " in XML stream");
}
}
writer.write(c);
writer.write(Character.toChars(c));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this could be slightly less efficient since it allocates a char[]. It does not seem that the method overall is optimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an action item here? If not, then what is the purpose of this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an action item, solely to note for any other reviewers that this change could affect performance, if that is even a consideration.

} else {
if (mode == XML_1_0) {
if (c < 9 || c == '\u000b' || c == '\u000c' || c == '\u000e' || c >= '\u000f' && c <= '\u001f') {
Expand All @@ -272,7 +270,7 @@ private void writeText(final String text, final boolean isAttribute) {
writer.write(';');
}
}
}
});
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,33 @@ public void testThrowsForInvalidUnicodeCharacterslInXml1_1Mode() {
assertXmlProducedIs("<tag>&#xd7ff;\ue000\ufffd</tag>");
}

public void testSupportsSupplementaryMultilingualPlaneInQuirks_Mode() {
writer = new PrettyPrintWriter(buffer, PrettyPrintWriter.XML_QUIRKS);
writer.startNode("tag");
writer.setValue("\uD83E\uDD8A");
writer.endNode();

assertXmlProducedIs("<tag>\uD83E\uDD8A</tag>");
}

public void testSupportsSupplementaryMultilingualPlaneInXml1_0Mode() {
writer = new PrettyPrintWriter(buffer, PrettyPrintWriter.XML_1_0);
writer.startNode("tag");
writer.setValue("\uD83E\uDD8A");
writer.endNode();

assertXmlProducedIs("<tag>\uD83E\uDD8A</tag>");
}

public void testSupportsSupplementaryMultilingualPlaneInXml1_1Mode() {
writer = new PrettyPrintWriter(buffer, PrettyPrintWriter.XML_1_1);
writer.startNode("tag");
writer.setValue("\uD83E\uDD8A");
writer.endNode();

assertXmlProducedIs("<tag>\uD83E\uDD8A</tag>");
}

private String replace(final String in, final char what, final String with) {
final int pos = in.indexOf(what);
if (pos == -1) {
Expand Down