-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegEx Validation Using Java Patterns (perf) #176
base: master
Are you sure you want to change the base?
Conversation
While using this library we found RhinoHelper takes way too much time for pattern validations using Visual VM CPU Profiling. Here is an observation: - JSON Schema: Roughly 30 fields, 10 fields have one or the other validation, 5 of them have patterns (listed below) - Input Data: An array of 10K messages - Time Taken: -- With out pattern annotation: ~6 seconds -- With Rhino Helper: ~30 seconds -- With Java Patterns: ~7 seconds Patterns used: "pattern": "^(19[7-9][0-9]|[2-9][0-9][0-9][0-9])-", "pattern" : "^[+-][0-1][0-9]:[0-5][0-9]$", "pattern": "^(19[7-9][0-9]|[2-9][0-9][0-9][0-9])-", "pattern": "[^\\s]{1}", "pattern": "[^\\s]{1}",
Hi, thanks for the pull request. Sadly the json-schema spec explicitly dictates that the patterns should be ecma-262 style regexes. Changing to just use java's pattern matcher will more than likely introduce some incompatibilities. Maybe if we can dependency inject the pattern matcher you can have your own "non-standard" but fast checker. Or if there is a native version of java regexp matcher that more closely mirrors javascript rules. |
Thanks for looking into this and a quick response, huggsboson. I was not aware of ECMA standard alignment. So, scratch my patch :). Coming to alternatives, I would like to have following changes (which I
If both of the above changes are possible (and does not conflict with ECMA Two of the patterns we use are actually date patterns :) Thanks, On Wed, Feb 17, 2016 at 3:07 AM, huggsboson notifications@github.com
Laxmi |
I'm trying to think... this feels like something where a native date-time type would do a better job than trying to extend regular expressions. I'm less familiar with what we have already built-in with regards to dates and times, but I believe there is something. (Maybe see if the spec has a reference to dates?) |
Is this still and issue? I have noticed that the issue has not been commented on since March 2016 and that the latest version of the validator allows for the "format":"date-time" keywords which implements the RFC 3339, section 5.6 interpretation of date time. I am therefore assuming this issue to be no longer relevant. I tested this with the following code:
|
This Sampath an engineer who worked with @olnrao, and observed the performance issue with regex pattern matchings. I will run the same test data against new code in a couple of days and will let you know whether performance has really improved or not. |
public PatternValidator(final JsonNode digest) | ||
{ | ||
_patterns = new HashMap<String, Pattern>(); | ||
super("pattern"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super should be the first statement in the constructor, otherwise, compilation fail with
call to super must be first statement in constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@olnrao there is compilation issue please fix that, I have fixed that compilation locally and tested with following.
|
The actual issue for which this pull request was raised is, for regular expression in the schema, pattern matching performance is very poor for this library since it is not storing the compiled pattern of regular expression , for very JSON message validation we have date-time format fields in our scheme but we want min & max values for this fields. the date-time format in the schema can be applied on string type, but there is no native date-time type in json schema. so basically along with date-time format support in JSON schema we need, min date and max date support in JSON shema, schema So I strongly support this fix and recommend fix should go to master branch. Please let me know any thing I have missed here or mis-interpreted. |
@sampath-methuku aside from the change you mentioned above that fixes the compilation error, " |
While using this library we found RhinoHelper takes way too much time for pattern validations using Visual VM CPU Profiling.
Here is an observation:
-- With out pattern annotation: ~6 seconds
-- With Rhino Helper: ~30 seconds
-- With Java Patterns: ~7 seconds
Patterns used:
"pattern": "^(19[7-9][0-9]|[2-9][0-9][0-9][0-9])-",
"pattern" : "^[+-][0-1][0-9]:[0-5][0-9]$",
"pattern": "^(19[7-9][0-9]|[2-9][0-9][0-9][0-9])-",
"pattern": "[^\s]{1}",
"pattern": "[^\s]{1}",