-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add enrich processor and enrich source meta field mapper. #41521
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The enrich processor uses a field value from the document being enriched and uses that to do a lookup in the locally allocated enrich index shard. If there is a match then retrieves the source of the enrich document from the enrich source field. This is a special binary doc values field. The document being enriched then gets values from the enrich document based on the configured decorate fields. The policy contains the information what field in the enrich index to query and what fields are available to decorate a document being enriched with. The enrich processor has the following configuration options: * `policy_name` - the name of the policy this processor should use * `enrich_key_field` - the field in the document being enriched that holds to lookup value * `enrich_key_field_ignore_missing` - Whether to allow the key field to be missing * `enrich_values` - a list of fields to decorate the document being enriched with. Each entry holds a source field and a target field. The source field indicates what decorate field to use that is available in the policy. The target field controls the field name to use in the document being enriched. The source and target fields can be the same. Example pipeline config: ``` { "processors": [ { "policy_name": "my_policy", "key": "host_name", "values": [ { "source": "globalRank", "target": "global_rank" } ] } ] } ``` In the above example documents are being enriched with a global rank value. For each document that has match in the enrich index based on its host_name field, the document gets an global rank field value, which is fetched from the `globalRank` field in the enrich index and saved as `global_rank` in the document being enriched. The enrich source field mapper is an internal field mapper meant to be used by enrich exclusively. Relates to elastic#32789
martijnvg
added
>non-issue
:Data Management/Ingest Node
Execution or management of Ingest Pipelines including GeoIP
labels
Apr 25, 2019
Pinging @elastic/es-core-features |
@elasticmachine run elasticsearch-ci/1 |
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
Apr 25, 2019
The enrich processor performs a lookup in a locally allocated enrich index shard using a field value from the document being enriched. If there is a match then the _source of the enrich document is fetched. The document being enriched then gets the decorate values from the enrich document based on the configured decorate fields in the pipeline. Note that the usage of the _source field is temporary until the enrich source field that is part of elastic#41521 is merged into the enrich branch. Using the _source field involves significant decompression which not desired for enrich use cases. The policy contains the information what field in the enrich index to query and what fields are available to decorate a document being enriched with. The enrich processor has the following configuration options: * `policy_name` - the name of the policy this processor should use * `enrich_key` - the field in the document being enriched that holds to lookup value * `enrich_key_ignore_missing` - Whether to allow the key field to be missing * `enrich_values` - a list of fields to decorate the document being enriched with. Each entry holds a source field and a target field. The source field indicates what decorate field to use that is available in the policy. The target field controls the field name to use in the document being enriched. The source and target fields can be the same. Example pipeline config: ``` { "processors": [ { "policy_name": "my_policy", "enrich_key": "host_name", "enrich_values": [ { "source": "globalRank", "target": "global_rank" } ] } ] } ``` In the above example documents are being enriched with a global rank value. For each document that has match in the enrich index based on its host_name field, the document gets an global rank field value, which is fetched from the `globalRank` field in the enrich index and saved as `global_rank` in the document being enriched. This is PR is part one of elastic#41521
Merged
martijnvg
added a commit
that referenced
this pull request
Apr 30, 2019
The enrich processor performs a lookup in a locally allocated enrich index shard using a field value from the document being enriched. If there is a match then the _source of the enrich document is fetched. The document being enriched then gets the decorate values from the enrich document based on the configured decorate fields in the pipeline. Note that the usage of the _source field is temporary until the enrich source field that is part of #41521 is merged into the enrich branch. Using the _source field involves significant decompression which not desired for enrich use cases. The policy contains the information what field in the enrich index to query and what fields are available to decorate a document being enriched with. The enrich processor has the following configuration options: * `policy_name` - the name of the policy this processor should use * `enrich_key` - the field in the document being enriched that holds to lookup value * `ignore_missing` - Whether to allow the key field to be missing * `enrich_values` - a list of fields to decorate the document being enriched with. Each entry holds a source field and a target field. The source field indicates what decorate field to use that is available in the policy. The target field controls the field name to use in the document being enriched. The source and target fields can be the same. Example pipeline config: ``` { "processors": [ { "policy_name": "my_policy", "enrich_key": "host_name", "enrich_values": [ { "source": "globalRank", "target": "global_rank" } ] } ] } ``` In the above example documents are being enriched with a global rank value. For each document that has match in the enrich index based on its host_name field, the document gets an global rank field value, which is fetched from the `globalRank` field in the enrich index and saved as `global_rank` in the document being enriched. This is PR is part one of #41521
martijnvg
added a commit
that referenced
this pull request
Apr 30, 2019
The enrich processor performs a lookup in a locally allocated enrich index shard using a field value from the document being enriched. If there is a match then the _source of the enrich document is fetched. The document being enriched then gets the decorate values from the enrich document based on the configured decorate fields in the pipeline. Note that the usage of the _source field is temporary until the enrich source field that is part of #41521 is merged into the enrich branch. Using the _source field involves significant decompression which not desired for enrich use cases. The policy contains the information what field in the enrich index to query and what fields are available to decorate a document being enriched with. The enrich processor has the following configuration options: * `policy_name` - the name of the policy this processor should use * `enrich_key` - the field in the document being enriched that holds to lookup value * `ignore_missing` - Whether to allow the key field to be missing * `enrich_values` - a list of fields to decorate the document being enriched with. Each entry holds a source field and a target field. The source field indicates what decorate field to use that is available in the policy. The target field controls the field name to use in the document being enriched. The source and target fields can be the same. Example pipeline config: ``` { "processors": [ { "policy_name": "my_policy", "enrich_key": "host_name", "enrich_values": [ { "source": "globalRank", "target": "global_rank" } ] } ] } ``` In the above example documents are being enriched with a global rank value. For each document that has match in the enrich index based on its host_name field, the document gets an global rank field value, which is fetched from the `globalRank` field in the enrich index and saved as `global_rank` in the document being enriched. This is PR is part one of #41521
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this pull request
May 23, 2019
The enrich source field mapper stores the source of a document as binary doc values. This is useful in cases where retrieval speeds are more important than compact storage (which is what SourceFieldMapper does), which is the case for the enrich processor. Prior to this change enrich processor was using _source stored field to fetch the enrich document to enrich document being ingested. The enrich policy runner, when creating the enrich index, disables _source meta field and enables the _enrich_source meta field. The enrich source field mapper is an internal field, which is only meant to be used by the enrich feature. Relates to elastic#41521 and elastic#32789
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The enrich processor uses a field value from the document being
enriched and uses that to do a lookup in the locally allocated
enrich index shard. If there is a match then retrieves the source
of the enrich document from the enrich source field. This is a special
binary doc values field. The document being enriched then gets values
from the enrich document based on the configured decorate fields.
The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.
The enrich processor has the following configuration options:
policy_name
- the name of the policy this processor should useenrich_key_field
- the field in the document being enriched that holds to lookup valueenrich_key_field_ignore_missing
- Whether to allow the key field to be missingenrich_values
- a list of fields to decorate the document being enriched with.Each entry holds a source field and a target field.
The source field indicates what decorate field to use that is available in the policy.
The target field controls the field name to use in the document being enriched.
The source and target fields can be the same.
Example pipeline config:
In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the
globalRank
field in the enrich index and saved as
global_rank
in the document being enriched.The enrich source field mapper is an internal field mapper meant to be
used by enrich exclusively.
Relates to #32789
I will open this PR as draft, because I will split this PR in two smaller PRs. One PR around the enrich processor and one pr around the enrich source field mapper. I think this will make the reviewing easier.
cc: @jakelandis @hub-cap @jbaiera @jpountz