-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geo Point parse error fix #40447
Geo Point parse error fix #40447
Conversation
When geo point parsing threw a parse exception, it did not consume remaining tokens from the parser. This in turn meant that indexing documents with malformed geo points into mappings with ignore_malformed=true would fail in some cases, since DocumentParser expects geo_point parsing to end on the END_OBJECT token. Related to elastic#17617
Pinging @elastic/es-analytics-geo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had the same issues on geoshape side and used a slightly different approach in geoshape parsing. We now have a special wrapper for XContentParser that allows skipping children in case of a parse error regardless on where the error occurred. It allows us to have one catch all section instead of thinking about every single path that things can go wrong. It also guarantees that we will never consume more tokens than we are supposed to or leave some hanging around. Would you be interested in trying this approach instead?
When geo point parsing threw a parse exception, it did not consume remaining tokens from the parser. This in turn meant that indexing documents with malformed geo points into mappings with ignore_malformed=true would fail in some cases, since DocumentParser expects geo_point parsing to end on the END_OBJECT token. Related to elastic#17617
Thanks for the tip on the sub parser, @imotov. I adjusted to use that approach in e4791a9. I changed the sub parser to also support arrays (since they have the same fundamental issue) and chose to also support non-object/non-arrays (this makes it much simpler to use at least for the geo point case). Forgot to fix |
Improved XContentSubParser to allow any token, which is useful for wrapping in cases where both object and values are allowed. Related to elastic#17617
@imotov this should now be ready for another review. A few notable changes:
Thinking more about this, it could be desirable to let the DocumentParser create the sub parser, ensuring that individual mappers do not have to consider this. I chose not to pursue this for now (larger effort). |
Could you explain the motivation behind supporting values a bit more? The original intent of the subparser was to create a parsing context that the parser cannot escape or underconsume in case of an error. In case of the object the context is the entire object until the corresponding closing bracket. I can clearly see how this logic can be extended to arrays, where the context is all elements until the closing |
There are two primary motivations for also supporting value:
|
I feel like considering the amount of complexity that supporting a single value adds, it might be easier to just wrap after we encountered an object or an array instead of blindly wrapping every time on the outside. Basically just wrap here and here. This case we can get away with all the complexity of handling values. |
I am not sure I follow you on this, the change to support values was really a one liner (initialize level to 1 or 0 depending on current token). The rest of the checks were added to support that whatever comes out of nextToken() will also come out of currentToken(). Before this, you could forward the subparser beyond the END, get null back from nextToken() and then call currentToken() or any of the other accessor methods to get the value. This is an existing issue before this PR, but now that we start using the sub parser more it made sense to me to ensure this property holds also for the sub parser. I do not mind following your suggestion, but I wanted to clarify above first to ensure this was included in your evaluation before I do the changes. |
I agree that we should fix currentToken and currentName, but I think all other accessors method become only the real issue if you allow stopping on values not on the end of object or array. Otherwise, you are calling them on the "}" or "]", they return some nonsense, but I don't think we have to make them consistent with nonsense that is returned when you go beyond the end with normal parser. In any case I would prefer not to wrap the values, since it gives too much runtime overhead for too little value in my opinion. |
Reverted to a minimalistic change. Related to elastic#17617
@imotov, reverted my previous changes and made a minimalistic change to fix the issue reported. Please have another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for doing this! Really appreciate it.
* master: (25 commits) [DOCS] Correct keystore commands for Email and Jira actions in Watcher (elastic#40417) [DOCS] Document common settings for snapshot repository plugins (elastic#40475) Remove with(out)-system-key tests (elastic#40547) Geo Point parse error fix (elastic#40447) Handle null retention leases in WaitForNoFollowersStep (elastic#40477) [DOCS] Adds anchors for ruby client (elastic#39867) Mute DataFrameAuditorIT#testAuditorWritesAudits Disable integTest when Docker is not available (elastic#40585) Add randomScore function in script_score query (elastic#40186) Test fixtures krb5 (elastic#40297) Correct ILM metadata minimum compatibility version (elastic#40569) SQL: Centralize SQL test dependencies version handling (elastic#40551) Mute testTracerLog Mute testHttpInput Include functions' aliases in the list of functions (elastic#40584) Optimise rejection of out-of-range `long` values (elastic#40325) Add docs for cluster.remote.*.proxy setting (elastic#40281) Migrate systemd packaging tests from bats to java (elastic#39954) Move top-level pipeline aggs out of QuerySearchResult (elastic#40319) Use openjdk 12 in packer cache script (elastic#40498) ...
* elastic/master: (77 commits) Update ingest jdocs that a null return value will drop the current document. (elastic#40359) Remove -Xlint exclusions in the ingest-common module. (elastic#40505) Update docs for the DFR similarity (elastic#40579) Fix merging of search_as_you_type field mapper (elastic#40593) Support roles with application privileges against wildcard applications (elastic#40398) Remove obsolete security settings (elastic#40496) Remove Gradle deprecation warnings (elastic#40449) Run the build integ test in parallel (elastic#39788) Fix 3rd pary S3 tests (elastic#40588) lower bwc skip for search as you type (elastic#40599) Muting XContentParserTests#testSubParserArray Fixing typo in test error message (elastic#40611) Update max dims for vectors to 1024. (elastic#40597) Add start and stop time to cat recovery API (elastic#40378) [DOCS] Correct keystore commands for Email and Jira actions in Watcher (elastic#40417) [DOCS] Document common settings for snapshot repository plugins (elastic#40475) Remove with(out)-system-key tests (elastic#40547) Geo Point parse error fix (elastic#40447) Handle null retention leases in WaitForNoFollowersStep (elastic#40477) [DOCS] Adds anchors for ruby client (elastic#39867) ...
* elastic/master: (129 commits) Update ingest jdocs that a null return value will drop the current document. (elastic#40359) Remove -Xlint exclusions in the ingest-common module. (elastic#40505) Update docs for the DFR similarity (elastic#40579) Fix merging of search_as_you_type field mapper (elastic#40593) Support roles with application privileges against wildcard applications (elastic#40398) Remove obsolete security settings (elastic#40496) Remove Gradle deprecation warnings (elastic#40449) Run the build integ test in parallel (elastic#39788) Fix 3rd pary S3 tests (elastic#40588) lower bwc skip for search as you type (elastic#40599) Muting XContentParserTests#testSubParserArray Fixing typo in test error message (elastic#40611) Update max dims for vectors to 1024. (elastic#40597) Add start and stop time to cat recovery API (elastic#40378) [DOCS] Correct keystore commands for Email and Jira actions in Watcher (elastic#40417) [DOCS] Document common settings for snapshot repository plugins (elastic#40475) Remove with(out)-system-key tests (elastic#40547) Geo Point parse error fix (elastic#40447) Handle null retention leases in WaitForNoFollowersStep (elastic#40477) [DOCS] Adds anchors for ruby client (elastic#39867) ...
Talked to @imotov and we think this should be backported up to 6.7. |
When geo point parsing threw a parse exception, it did not consume remaining tokens from the parser. This in turn meant that indexing documents with malformed geo points into mappings with ignore_malformed=true would fail in some cases, since DocumentParser expects geo_point parsing to end on the END_OBJECT token. Related to #17617
When geo point parsing threw a parse exception, it did not consume remaining tokens from the parser. This in turn meant that indexing documents with malformed geo points into mappings with ignore_malformed=true would fail in some cases, since DocumentParser expects geo_point parsing to end on the END_OBJECT token. Related to #17617
Backported to 7.x, 7.0 and 6.7. |
* elastic/7.0: [TEST] Mute WebhookHttpsIntegrationTests.testHttps [DOCS] Add 'time value' links to several monitor settings (elastic#40633) (elastic#40687) Do not perform cleanup if Manifest write fails with dirty exception (elastic#40519) Remove mention of soft deletes from getting started (elastic#40668) Fix bug in detecting use of bundled JDK on macOS Reindex conflicts clarification (docs) (elastic#40442) SQL: [Tests] Enable integration tests for fixed issues (elastic#40664) Add information about the default sort mode (elastic#40657) SQL: [Docs] Fix example for CURDATE SQL: [Docs] Fix doc errors regarding CURRENT_DATE. (elastic#40649) Clarify using time_zone and date math in range query (elastic#40655) Add notice for bundled jdk (elastic#40576) disable kerberos test until kerberos fixture is working again [DOCS] Use "source" instead of "inline" in ML docs (elastic#40635) Unmute and fix testSubParserArray (elastic#40626) Geo Point parse error fix (elastic#40447) Increase suite timeout to 30 minutes for docs tests (elastic#40521) Fix repository-hdfs when no docker and unnecesary fixture Avoid building hdfs-fixure use an image that works instead
When geo point parsing threw a parse exception, it did not consume remaining tokens from the parser. This in turn meant that indexing documents with malformed geo points into mappings with ignore_malformed=true would fail in some cases, since DocumentParser expects geo_point parsing to end on the END_OBJECT token. Related to #17617
Fixed compilation on 6.7
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.
Related to #17617