Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizes opt* functions #337

Merged
merged 12 commits into from
May 23, 2017
Merged

Optimizes opt* functions #337

merged 12 commits into from
May 23, 2017

Conversation

johnjaylward
Copy link
Contributor

@johnjaylward johnjaylward commented May 16, 2017

Adjustments to the opt* functions in reference to #334

Notable changes:

  • New API functions for optFloat(key, default) and optFloat(key)
  • New API functions for optNumber(key, default) and optNumber(key)
  • New API functions for getNumber and getFloat
  • Update all opt* functions that relied on the get* function throwing an exception to no longer rely on that.

Test speedup over the old implementation is 870s -> 120s for a 2.1 GB file (about 14% of the time; or 86% time saved). See #337 (comment) and #337 (comment)

What problem does this code solve?
This is a performance enhancement, not a bug fix. Some code is refactored to prevent exceptions from being thrown. Type coercion is also standardized between all the opt* number functions.

Risks
low

Changes to the API?
New methods added for optFloat and optNumber.

Will this require a new release?
No, can be rolled into the next release

Should the documentation be updated?
No

Does it break the unit tests?
Broke one unit tests for BigDecimal->BigInteger conversion due to updated coercion rules. Tests has been updated and new tests added for new methods optFloat and optNumber. See stleary/JSON-Java-unit-test#71 for full changes.

Was any code refactored in this commit?
Yes

Review status
ACCEPTED, Starting 3 day timer for comments

@TheMatthew
Copy link

TheMatthew commented May 17, 2017

Tested on a 2.1 Gb json file on an i7-620 with an ssd.
Before parsing with lots of extra logic: 870 s
After: 120 s

This is in a real-life application. I would assume that micro-benchmarks would yield a gain of at least 20x.

All in all, this is great, thanks!

@johnjaylward
Copy link
Contributor Author

Is the 870s the original JSONObject implementation with no pre-checks, or the modified one you linked in the issue that uses the has(key) calls? I'd just like to verify what we are looking at.

@TheMatthew
Copy link

Sorry for the delay

No changes origin/master: 870s
has(key): 122s
your change: 120s

John J. Aylward added 3 commits May 17, 2017 11:29
* Add support to JSONArray and JSONObject to optionally get raw number values
* Add support to JSONArray and JSONObject to optionally get float values
@johnjaylward
Copy link
Contributor Author

Thanks @TheMatthew . This PR should be feature complete now. I'll update the description to reflect all the changes.

@johnjaylward
Copy link
Contributor Author

Tests updated and PR description updated to match testing changes.

@stleary
Copy link
Owner

stleary commented May 18, 2017

Nice, performance improved and code made more readable. Accepted pending 3 day timer.

@guidomedina
Copy link

@johnjaylward
Do you know how to squash your commits? will make it much easier now and in the future to code review.

@johnjaylward
Copy link
Contributor Author

@guidomedina it's easy to code review without erasing history. Please stop spaming

@guidomedina
Copy link

guidomedina commented May 18, 2017

1st time I have been told to "stop spamming" on Github in my 5 years here, I only made 3 comments, valid comments so not sure what you mean.

It is easy to code review if using a diff between branches, maybe you are new to contributing but most projects I have contributed will ask you to squash the commits, it is a valid request, maybe not this project.

@johnjaylward
Copy link
Contributor Author

The comments on squashing commits are not relevant to the pull request nor helpful to the review. This is not a discussion board for the merits or pitfalls of rebasing.

JSONObject.java Outdated
@@ -578,10 +581,10 @@ public double getDouble(String key) throws JSONException {
Object object = this.get(key);
try {
return object instanceof Number ? ((Number) object).doubleValue()
: Double.parseDouble((String) object);
: new BigDecimal((String) object).doubleValue();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a BigDecimal to parse a double will add too much overhead for no reason, if you try to represent a double from a String it will stay as is, if it was indeed a double then nothing will be gained or lost, you are only adding more GC overhead.

Same thing for your other BigDecimal introduction when working with float or double

Copy link
Contributor Author

@johnjaylward johnjaylward May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Double.parse will throw an exception if the number can not be represented as a double.
From the javadocs:

double java.lang.Double.parseDouble(String s) throws NumberFormatException

Returns a new double initialized to the value represented by the specified String, as performed by the valueOf method of class Double.
Parameters: s the string to be parsed.
Returns:the double value represented by the string argument.
Throws: NullPointerException - if the string is null
NumberFormatException - if the string does not contain a parsable double.

Since:1.2
See Also:java.lang.Double.valueOf(String)

Double.valueOf seems to do something closer to what you are trying to accomplish, but it would be the same amount of garbage collection.

new BigDecimal(String) will always parse the value as long as it's a valid decimal number (i.e. not "abc"). The doubleValue call will then do a narrowing conversion which may change the value to -Inf or +Inf.

Lastly, this change would be out of scope for this PR. the get* methods were not modified here, only the opt*. If you feel we should evaluate it further, feel free to open another pull request with the changes and notes to performance impacts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new BigDecimal(...) and Double.parseDouble(...) have the same behavior, an input that fails for parseDouble will also fail with new BigDecimal, the JavaDoc have different wording but the end result will be the same.

As for relevance, you are trying to make opt* operations faster and yet you are introducing an expensive and unneeded instantiation when the return value is a native (small d)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc is what to go by. If someone is running on a different VM with a different implementation, they may not function the same.

The BigDecimal conversion is not that expensive. I'd like to see some stats on it before making a change like that. We have no guarantees on the input. Just because someone asks for an int does not mean I would want to use Integer.parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not "introducing" the big decimal conversion. It was there before the refactor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then int typecast would be wrong for my proposed shortcut but it can be easily mitigated by checking if the double value is between Integer.MIN_VALUE and Integer.MAX_VALUE and if it isn't throw an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's an opt* function. The entire point is that it does not throw an exception. I'm not disagreeing that the parsing performance can be improved. But I think it needs more than a cursory glance and wishful thinking.

This PR is focusing strictly on the request made in #337 which was to mitigate the Exception handling impact.

Any parsing impacts should be in a new PR after this one is merged.

Any narrowing coercion has the potential to change the number. This is a given and also why I added the optNumber call to the API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrowing will not change the double for long, maybe for int, but then it is the same for BigDecimal calls, the reason I commented into this particular change is because of what I'm seeing:

-                    : Double.parseDouble((String) object);
+                    : new BigDecimal((String) object).doubleValue();

Am I reading this incorrectly? I'm seeing Double.parseDouble(...) changed to new BigDecimal(...) in this commit.

Narrowing will cause loss always, but the same will happen with BigDecimal hence my point of not bothering with such overhead at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check that. If I did change it, I'll likely change it back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I must have changed that by accident when I was testing stuff and didn't notice. I'll revert that change back and update the respective optDouble and optFloat calls that I copied that to.

As for the narrowing, it is handled differently between BigDecimal and Double. Also, not all Long values fit in a double. see the test case changes @ stleary/JSON-Java-unit-test#71 that I updated today. A Double only has 53 bits for it's mantissa which will not let any value over 2^53 be represented correctly.

John J. Aylward added 2 commits May 18, 2017 11:41
* Don't call toString when we know it's a string, just cast
* Updates optNumber to be smarter about which object it uses to parse strings
@johnjaylward
Copy link
Contributor Author

@stleary I added the optFloat/optNumber, but no associated getFloat/getNumber. Should I add them in for posterity?

@stleary
Copy link
Owner

stleary commented May 18, 2017

@johnjaylward Sure, in general when an opt* method is added, a corresponding get*method is a good idea.

John J. Aylward added 3 commits May 18, 2017 14:24
* Extracts the stringToNumber logic that the optNumber method uses to reuse it between classes
* Fixes -0 issue with optNumber/getNumber
@johnjaylward
Copy link
Contributor Author

@TheMatthew can you retest the latest commit of this? I want to ensure the new pull out of the number parsing functions didn't adversely affect the performance of your test case.

return val.indexOf('.') > -1 || val.indexOf('e') > -1
|| val.indexOf('E') > -1 || "-0".equals(val);
}

Copy link

@guidomedina guidomedina May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice "quick and dirty" function, what happens if running on a server with a different locale?

Anyway the following example can help:

static final char MINUS_SIGN;
static final char DECIMAL_SEPARATOR;
  
static {
   final DecimalFormatSymbols currentLocaleSymbols = DecimalFormatSymbols.getInstance();
   MINUS_SIGN = currentLocaleSymbols.getMinusSign();
   DECIMAL_SEPARATOR = currentLocaleSymbols.getDecimalSeparator();
}

Copy link

@guidomedina guidomedina May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I'm not sure what should be done if the decimal separator for the server locale is a comma, not sure how the parse function will behave, might not be worth the effort in supporting this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not looking for any number, we are looking for JSON numbers, which are not locale dependent.

Copy link

@guidomedina guidomedina May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it, ignore this completely, FloatingDecimal line 198 & 224:

46 = dot

if(var4 != 46) {
  break;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from BigDecimal:

                    } else if (c == '.') {   // have dot
                        // have dot
                        if (dot) // two dots
                            throw new NumberFormatException();
                        dot = true;

Double.parse is similar.

* prevents erasure of stack trace for rethrown exceptions
@TheMatthew
Copy link

Sorry, I was in a huge rush, I will retest.

@TheMatthew
Copy link

Before this patch AKA current git head: 870 s
My suggested simple patch: 122 s
@johnjaylward 's first patch: 120 s
@johnjaylward 's current patch: 126 s

@stleary
Copy link
Owner

stleary commented May 21, 2017

Performance still looks good. @johnjaylward let me know when you are ready to proceed with the pull request.

@TheMatthew
Copy link

TheMatthew commented May 21, 2017

Ok, I started profiling. the code in this commit looks moderately tight.

Some low hanging fruits I saw:
JSONObject.keys() should probably never be used, rather it would be better to do an enhanced for loop on JSONObject#keySet()

This brings the execution time down from 126 s to 115.

Then if you have an array let's say called "CHARS_TO_APPEND" which has 256 values, Like this:

    private static final boolean CHARS_TO_APPEND[]= new boolean[256];
static{
	Arrays.fill(CHARS_TO_APPEND, true);
	for(int i = 0; i < ' '; i++){
		CHARS_TO_APPEND[i]=false;
	}
	CHARS_TO_APPEND[',']=false;
	CHARS_TO_APPEND[':']=false;
	CHARS_TO_APPEND[']']=false;
	CHARS_TO_APPEND['}']=false;
	CHARS_TO_APPEND['/']=false;
	CHARS_TO_APPEND['\\']=false;
	CHARS_TO_APPEND['"']=false;
	CHARS_TO_APPEND['[']=false;
	CHARS_TO_APPEND['{']=false;
	CHARS_TO_APPEND[';']=false;
	CHARS_TO_APPEND['=']=false;
	CHARS_TO_APPEND['#']=false;
}

and replace the while loop in JSONTokener#nextValue() with

while (CHARS_TO_APPEND[c]) {

you reach 113s.

Here is a quick profile I did. It can be opened with visualvm 1.3.9 .
JSON-java.nps.zip
The jist of it is that most of your time is now spent in the tokener, which is good and the reader, which is excellent.

Hope this helps.

edit: looking at the examples of acceleration really shows how close this is to optimal. at this point optimizing if statements yields a performance change. that is typically a sign that the low hanging fruits are almost all gone.

@TheMatthew
Copy link

For transparency purposes I've shared this github benchmark https://github.com/TheMatthew/json-benchmark

it's under apache so it should be compatible for any copy-pasted tests

Initial results:

org.json.simple: 140s,
org.json: 870s
after the patch: 126s

@johnjaylward
Copy link
Contributor Author

@stleary I agree, this still looks good. Let's call this one done. I just pulled in the changes recently merged to master into this PR.

@TheMatthew Lets take that optimization to a separate PR after this one is merged.

@stleary stleary merged commit fbd2be7 into stleary:master May 23, 2017
stleary added a commit to stleary/JSON-Java-unit-test that referenced this pull request May 23, 2017
@johnjaylward johnjaylward deleted the OptimizeOpt branch July 7, 2017 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants