-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ingest-attachment support for per document indexed_chars
limit
#31352
Conversation
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself. It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process. We add an option which reads this limit value from the document itself by adding a setting named `indexed_chars_field`. Which allows running: ``` PUT _ingest/pipeline/attachment { "description" : "Extract attachment information. Used to parse pdf and office files", "processors" : [ { "attachment" : { "field" : "data", "indexed_chars_field" : "size" } } ] } ``` Then index either: ``` PUT index/doc/1?pipeline=attachment { "data": "BASE64" } ``` Which will use the default value (or the one defined by `indexed_chars`) Or ``` PUT index/doc/2?pipeline=attachment { "data": "BASE64", "size": 1000 } ``` Backport of elastic#28977 in 6.x branch (6.4.0)
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@dadoonet The CI build failed because if a checkstyle violation: |
* 6.x: Add get stored script and delete stored script to high level REST API Increasing skip version for failing test on 6.x Skip get_alias tests for 5.x (#31397) Fix defaults in GeoShapeFieldMapper output (#31302) Test: better error message on failure Mute DefaultShardsIT#testDefaultShards test Fix reference to XContentBuilder.string() (#31337) [DOCS] Adds monitoring breaking change (#31369) [DOCS] Adds security breaking change (#31375) [DOCS] Backports breaking change (#31373) RestAPI: Reject forcemerge requests with a body (#30792) Docs: Use the default distribution to test docs (#31251) Use system context for cluster state update tasks (#31241) [DOCS] Adds testing for security APIs (#31345) [DOCS] Removes ML item from release highlights [DOCS] Removes breaking change (#31376) REST high-level client: add validate query API (#31077) Move language analyzers from server to analysis-common module. (#31300) Expose lucene's RemoveDuplicatesTokenFilter (#31275) [Test] Fix :example-plugins:rest-handler on Windows Delete typos in SAML docs (#31199) Ensure we don't use a remote profile if cluster name matches (#31331) Test: Skip alias tests that failed all weekend [DOCS] Fix version in SQL JDBC Maven template [DOCS] Improve install and setup section for SQL JDBC Add ingest-attachment support for per document `indexed_chars` limit (#31352) SQL: Fix rest endpoint names in node stats (#31371) [DOCS] Fixes small issue in release notes Support for remote path in reindex api Closes #22913 [ML] Put ML filter API response should contain the filter (#31362) Remove trial status info from start trial doc (#31365) [DOCS] Added links in breaking changes pages [DOCS] Adds links to release notes and highlights Docs: Document changes in rest client QA: Fix tribe tests to use node selector REST Client: NodeSelector for node attributes (#31296) LLClient: Fix assertion on windows LLClient: Support host selection (#30523) Add QA project and fixture based test for discovery-ec2 plugin (#31107) [ML] Hold ML filter items in sorted set (#31338) [ML] Add description to ML filters (#31330)
Note: this is a backport of #28977 which does not require a review.
As I forgot to backport it, I just like to double check that CI will be happy.
I'll update the PR if CI is unhappy and then will ask for a review.
We today support a global
indexed_chars
processor parameter. But in some cases, users would like to set this limit depending on the document itself.It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named
indexed_chars_field
.Which allows running:
Then index either:
Which will use the default value (or the one defined by
indexed_chars
)Or
Backport of #28977 in 6.x branch (6.4.0)