Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support synthetic source for geo_point when ignore_malformed is used #109651

Merged

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Jun 12, 2024

This PR draws inspiration from #90777 but takes a slightly different approach in order to reduce the amount of plumbing required.

We should be able to reuse MemorizingXContentParser for other field types that accept objects.

Contributes to #106483.

@lkts lkts added :StorageEngine/Logs You know, for Logs :StorageEngine/Mapping The storage related side of mappings labels Jun 12, 2024
@lkts lkts requested a review from a team as a code owner June 12, 2024 21:39
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@lkts
Copy link
Contributor Author

lkts commented Jun 13, 2024

@elasticmachine update branch

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

@lkts lkts removed the :StorageEngine/Logs You know, for Logs label Jun 13, 2024
@@ -29,6 +29,10 @@ public static StoredField storedField(String name, XContentParser parser) throws
return XContentDataHelper.storedField(name(name), parser);
}

public static StoredField storedField(String name, XContentBuilder builder) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: short method comment.

id: "1"

- match: { _source.geo_point.0.lon: -71.34000004269183 }
- match: { _source.geo_point.0.lat: 41.1199999647215 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using close_to to avoid these long mismatching numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's neat, thanks.

import java.io.IOException;

/**
* A parser that parses a document preserving fields that were parsed so far in a {@link XContentBuilder}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/preserving/copying/ to be more explicit.

I'd also note that this has performance overhead due to copying.

if (parser.currentToken() == XContentParser.Token.START_OBJECT
|| parser.currentToken() == XContentParser.Token.START_ARRAY) {
// We have a complex structure so we'll memorize it while parsing.
var memorizingParser = new MemorizingXContentParser(parser);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: assign directly to parserWithCustomization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parserWithCustomization and memorizingParser are different types.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring!

throws IOException;

private void fetchFromSource(Object sourceMap, Consumer<T> consumer) {
try (XContentParser parser = wrapObject(sourceMap)) {
parse(parser, v -> consumer.accept(normalizeFromSource(v)), e -> {}); /* ignore malformed */
parse(parser, v -> consumer.accept(normalizeFromSource(v)), new NoopMalformedValueHandler()); /* ignore malformed */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for noop can we avoid a new instance every time? Maybe using a static instance?

- match: { _source.geo_point.0.lat: 41.1199999647215 }
- match: { _source.geo_point.1: "POINT (-77.03653 1000)" }


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test also the case where a geo point is empty? Or were only lat or lon are available? I would expect this to be reflected when reconstructing the document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test empty values in unit tests. I'll add a test for just one coordinate.

@lkts
Copy link
Contributor Author

lkts commented Jun 13, 2024

@elasticmachine update branch

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lkts lkts merged commit 5440f17 into elastic:main Jun 18, 2024
15 checks passed
@lkts lkts deleted the feature/geo_point_synthetic_source_ignore_malformed branch June 18, 2024 15:37
@felixbarny felixbarny mentioned this pull request Aug 6, 2024
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants