Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bringup Java 11 b04 - implement j.l.String.valueOfCodePoint() #1466

Merged
merged 2 commits into from
Mar 22, 2018

Conversation

JasonFengJ9
Copy link
Member

@JasonFengJ9 JasonFengJ9 commented Mar 16, 2018

Bringup Java 11 b04 - implement j.l.String.valueOfCodePoint()

Implemented j.l.String.valueOfCodePoint(), manually verified that the output of String.valueOfCodePoint(codePoint).codePointAt(0) equals codePoint for all Unicode code point.

closes: #1439

Reviewer @DanHeidinga @fjeremic

Signed-off-by: Jason Feng fengj@ca.ibm.com

@DanHeidinga
Copy link
Member

Change looks reasonable to me. @fjeremic Any comments on the string compression part?

@pshipton
Copy link
Member

Rather than manual verification, please add the automated test.

Did you manually test with string compression enabled?

@JasonFengJ9
Copy link
Member Author

Manually verified with both command line options -XX:+CompactStrings and -XX:-CompactStrings.

Will work on automated tests tomorrow.

if (enableCompression) {
string = new String(compressedAsciiTable[codePoint], 0, 1, true);
} else {
string = new String(decompressedAsciiTable[codePoint], 0, 1, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use:

https://github.com/JasonFengJ9/openj9/blob/c864cc795b78b3b1c573e8bc5684c7f9f413c6af/jcl/src/java.base/share/classes/java/lang/String.java#L402

Constructor here instead? The one currently being used will have to perform many of the checks again. Similarly for all other assignments to string in this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String(byteArray, coder) works as well, will update the PR accordingly.

String string;
if ((codePoint < Character.MIN_CODE_POINT) || (codePoint > Character.MAX_CODE_POINT)) {
/*[MSG "K0800", "Invalid Unicode code point"]*/
throw new IllegalArgumentException(com.ibm.oti.util.Msg.getString("K0800")); //$NON-NLS-1$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to put case as the last else case as it occurs least often. The JIT will have to do less work when block shuffling and if it every got it wrong we wouldn't pay the performance penalty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the next check for <= 255 need to expand then? It would also need to validate codePoint > MIN_CODE_POINT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will following checking help JIT performance?

if ((codePoint < Character.MIN_CODE_POINT)  {
  // error handling 
} else if (codePoint <= 255) {
} else if (codePoint < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
} else if (codePoint <= Character.MAX_CODE_POINT) {
} else {
  // error handling
}

This saves the less often check (codePoint > Character.MAX_CODE_POINT) to the last.

@JasonFengJ9 JasonFengJ9 force-pushed the valueofcp branch 2 times, most recently from ceaaaf3 to 7e7b607 Compare March 21, 2018 12:09
@JasonFengJ9
Copy link
Member Author

@fjeremic the PR is ready for another review with following updates:

  1. Uses constructor String(byteArray, coder);
  2. Modified the order to save less often check (codePoint > Character.MAX_CODE_POINT) last;
  3. Added tests. Note: this test suite is for Java 11 hence will run when Java 11 are introduced to the build jobs. Manually verified that it passes against Java 11 raw builds with modified command line options.

@llxia please arrange a test team review.

Additional note: three project files failed Copyright check however this kind of files don't have copyright header either such as Java[10|9|8]andUp.

// Save less often check (codePoint > Character.MAX_CODE_POINT) last
if (codePoint < Character.MIN_CODE_POINT) {
/*[MSG "K0800", "Invalid Unicode code point"]*/
throw new IllegalArgumentException(com.ibm.oti.util.Msg.getString("K0800")); //$NON-NLS-1$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include the string or the the code point in the exception? It will make it much easier to debug one of these issues.

Copy link
Member Author

@JasonFengJ9 JasonFengJ9 Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, now an exception looks like:

Standalone tests:
Caused by: java.lang.IllegalArgumentException: Invalid Unicode code point - -1
	at java.lang.String.valueOfCodePoint(java.base@11-internal/String.java:4589)

Caused by: java.lang.IllegalArgumentException: Invalid Unicode code point - 1114112
	at java.lang.String.valueOfCodePoint(java.base@11-internal/String.java:4589)

Automated tests:
FAILED: testValueOfCodePoint
java.lang.AssertionError: Unicode code point mismatch at 0 expected [1] but found [0]
	at org.testng.Assert.fail(Assert.java:96)
	at org.testng.Assert.failNotEquals(Assert.java:776)
	at org.testng.Assert.assertEqualsImpl(Assert.java:137)
	at org.testng.Assert.assertEquals(Assert.java:118)
	at org.testng.Assert.assertEquals(Assert.java:652)
	at org.openj9.test.java_lang.Test_String.testValueOfCodePoint(Test_String.java:49)

try {
String str = (String) method.invoke(null, i);
Assert.assertEquals(i, str.codePointAt(0), "Unicode code point mismatch at " + i);
} catch (IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases shouldn't catch unexpected exceptions. Let the test code throw them and the test harness will (usually) print the relevant stacktrace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, removed catching of unexpected exceptions.

@llxia
Copy link
Contributor

llxia commented Mar 21, 2018

@renfeiw , could you review the test change? Thanks.

static String valueOfCodePoint(int codePoint) {
String string;
// Save less often check (codePoint > Character.MAX_CODE_POINT) last
if (codePoint < Character.MIN_CODE_POINT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vouch to say this is also an uncommon path and should be moved with the else path. Do we agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check that before checking } else if (codePoint <= 255) { or we need to move it into the check for <= 255. Either way, its got to be first.

Copy link
Contributor

@fjeremic fjeremic Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but merging the two checks has the benefit of smaller footprint (since the first and the last blocks throw the same exception they can be merged) and less code to jump over to get to the 0 <= codePoint <= 255 case. i.e.

Current:

if (x < 0) {
    X;
} else if (x < 55) {
    Y;
} else if (x < 888) {
    Z;
} else {
    X;
}

can be simplified to:

if (x >= 0 && x < 55) {
    Y;
} else if (x >= 0 && x < 888) {
    Z;
} else {
    X;
}

Which both reduces footprint and makes the lest often taken case the last (X;). Though we do technically introduce another check. The JIT should do the right thing here though in whichever way it is written. I suppose if the JIT does not do this then we have a JIT bug we need to address. I'm fine with the code being either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly dislike the duplication of the second form. If we can't optimize the first form reasonably then we're going to have problems with most user code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are suggesting code like following:

if (codePoint >= Character.MIN_CODE_POINT && codePoint <= 255) {
  ...
} else if (codePoint >= Character.MIN_CODE_POINT && codePoint < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
  ...
} else if (codePoint >= Character.MIN_CODE_POINT && codePoint <= Character.MAX_CODE_POINT) {
  ...
} else {
  error handling
}

This introduces extra code and confusion (later reader need lots of explanation about the rational of this code logic). I don't think any performance gain here worth the potential issue later.
Now I feel this code should go back to original structure, i.e.,

if ((codePoint < Character.MIN_CODE_POINT) || (codePoint > Character.MAX_CODE_POINT)) {
  error handling
} else if (codePoint <= 255) {
  ...
} else if (codePoint < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
  ...
} else {
  ...
}

It is clear and easy to understand. Can we modify it to improve performance later if this becomes a hot spot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I hadn't thought about the extra checks introduced in my original comment. I'm fine either keeping as is or reverting back to your latest suggestion @JasonFengJ9. Apologies for the churn!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, updated PR as latest suggestion above.


<target name="build" >
<if>
<equals arg1="${JAVA_VERSION}" arg2="SE110"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to change this conditional check to "not SE80 && not SE90 && not SE100". This makes the build.xml to work for SE120 and up without any change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, changed the condition to not ^SE(8|9|10)0$$.

@renfeiw
Copy link
Contributor

renfeiw commented Mar 21, 2018

To enable SE110, we need to make an update in testKitGen. (#1495)

@JasonFengJ9
Copy link
Member Author

JasonFengJ9 commented Mar 21, 2018

The PR has been updated to address review comments.
Though Java 11 hasn't been built and run in any test job yet, I manually verified that Java11andUp test project can be built, run against a Java 11 raw build and detect problem as expected.
Please review again.

*/
static String valueOfCodePoint(int codePoint) {
String string;
// Save less often check (codePoint > Character.MAX_CODE_POINT) last
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is no longer accurate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in question has been removed.

// Character.MIN_CODE_POINT - 1 & Character.MAX_CODE_POINT + 1
try {
method.invoke(null, Character.MIN_CODE_POINT - 1);
method.invoke(null, Character.MAX_CODE_POINT + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second method.invoke will never run as the first show throw. We need this to be two tests or to have it written like:

try {
method.invoke(null, Character.MIN_CODE_POINT - 1);
fail()
} catch (IllegalArgumentException | InvocationTargetException e) {} 
try {
method.invoke(null, Character.MIN_CODE_POINT - 2);
fail()
} catch (IllegalArgumentException | InvocationTargetException e) {} 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test with suggested format above.
Please have another look.

Implemented j.l.String.valueOfCodePoint();
Added tests to verify that
String.valueOfCodePoint(codePoint).codePointAt(0) equals
codePoint for all valid Unicode code point.

Signed-off-by: Jason Feng <fengj@ca.ibm.com>
}
// Verify that IllegalArgumentException is thrown for Character.MIN_CODE_POINT - 1.
try {
method.invoke(null, Character.MIN_CODE_POINT - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an explicit Assert.fail() call or the method could succeed and pass the test when we know it should throw the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, added Assert.fail(message).

}
// Verify that IllegalArgumentException is thrown for Character.MAX_CODE_POINT + 1.
try {
method.invoke(null, Character.MAX_CODE_POINT + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an explicit Assert.fail() call or the method could succeed and pass the test when we know it should throw the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as well. Please have another look, thanks.

@DanHeidinga
Copy link
Member

Jenkins test sanity plinux

@DanHeidinga
Copy link
Member

Running one builds worth of sanity to ensure that this is correctly ignored by older Java levels.

@DanHeidinga DanHeidinga self-assigned this Mar 22, 2018
@DanHeidinga DanHeidinga merged commit 4a8df41 into eclipse-openj9:master Mar 22, 2018
@JasonFengJ9 JasonFengJ9 deleted the valueofcp branch June 20, 2018 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Latest JDK11 builds fail due to missing String.valueOfCodePoint(int) method
6 participants