Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eagerly check keyword field length #83738

Merged
merged 8 commits into from
Feb 15, 2022
Merged

Eagerly check keyword field length #83738

merged 8 commits into from
Feb 15, 2022

Conversation

kkewwei
Copy link
Contributor

@kkewwei kkewwei commented Feb 9, 2022

If the UTF8 encoding of a keyword field value is bigger than the max length 32766, Lucene fill fail the indexing request and, to roll back the changes, will mark the (possibly partially indexed) document as deleted. This results in deletes, even in an append-only workload, which in turn leads to slower merges, as these will potentially have to fall back to MergeStrategy.DOC instead of MergeStrategy.BULK. To avoid this, we do a preflight check here before indexing the document into Lucene.

Relates #80865

@elasticsearchmachine elasticsearchmachine added v8.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Feb 9, 2022
@pquentin pquentin added the :Search/Search Search-related issues that do not fall into other categories label Feb 14, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Feb 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR. I've left some comments.

@kkewwei
Copy link
Contributor Author

kkewwei commented Feb 15, 2022

TBR

@ywelsch
Copy link
Contributor

ywelsch commented Feb 15, 2022

@elasticmachine update branch
@elasticmachine test this please

@ywelsch ywelsch self-assigned this Feb 15, 2022
@ywelsch ywelsch added >enhancement and removed >bug labels Feb 15, 2022
@ywelsch
Copy link
Contributor

ywelsch commented Feb 15, 2022

@elasticmachine test this please

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked for the comment to be changed, otherwise LGTM

@ywelsch
Copy link
Contributor

ywelsch commented Feb 15, 2022

@elasticmachine test this please

@ywelsch ywelsch changed the title ES will discard the doc if the length of keyword field is bigger than 32766 before writing to Lucene Eagerly check keyword field length Feb 15, 2022
@ywelsch ywelsch merged commit efb76d8 into elastic:master Feb 15, 2022
@ywelsch
Copy link
Contributor

ywelsch commented Feb 15, 2022

Thank you @kkewwei!

@elasticsearchmachine
Copy link
Collaborator

@kkewwei please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation.

@kkewwei kkewwei deleted the fix_80865 branch February 16, 2022 03:07
@itizir
Copy link

itizir commented Apr 12, 2023

Hello!

Just noticed an issue around this: the exception thrown by KeywordFieldMapper is actually overridden by FieldMapper, so instead of getting the actual reason why indexing failed with a limited (30B) preview of the field value as before, the client gets a generic error message containing the whole field value. Not ideal in logs, and not very helpful as it obscures the root cause of the failure...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants