Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enrich processor and enrich source meta field mapper. #41521

Closed
wants to merge 2 commits into from

Conversation

martijnvg
Copy link
Member

The enrich processor uses a field value from the document being
enriched and uses that to do a lookup in the locally allocated
enrich index shard. If there is a match then retrieves the source
of the enrich document from the enrich source field. This is a special
binary doc values field. The document being enriched then gets values
from the enrich document based on the configured decorate fields.

The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.

The enrich processor has the following configuration options:

  • policy_name - the name of the policy this processor should use
  • enrich_key_field - the field in the document being enriched that holds to lookup value
  • enrich_key_field_ignore_missing - Whether to allow the key field to be missing
  • enrich_values - a list of fields to decorate the document being enriched with.
    Each entry holds a source field and a target field.
    The source field indicates what decorate field to use that is available in the policy.
    The target field controls the field name to use in the document being enriched.
    The source and target fields can be the same.

Example pipeline config:

{
   "processors": [
      {
         "policy_name": "my_policy",
         "key": "host_name",
         "values": [
            {
              "source": "globalRank",
              "target": "global_rank"
            }
         ]
      }
   ]
}

In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the globalRank
field in the enrich index and saved as global_rank in the document being enriched.

The enrich source field mapper is an internal field mapper meant to be
used by enrich exclusively.

Relates to #32789

I will open this PR as draft, because I will split this PR in two smaller PRs. One PR around the enrich processor and one pr around the enrich source field mapper. I think this will make the reviewing easier.

cc: @jakelandis @hub-cap @jbaiera @jpountz

The enrich processor uses a field value from the document being
enriched and uses that to do a lookup in the locally allocated
enrich index shard. If there is a match then retrieves the source
of the enrich document from the enrich source field. This is a special
binary doc values field. The document being enriched then gets values
from the enrich document based on the configured decorate fields.

The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.

The enrich processor has the following configuration options:
* `policy_name` - the name of the policy this processor should use
* `enrich_key_field` - the field in the document being enriched that holds to lookup value
* `enrich_key_field_ignore_missing` - Whether to allow the key field to be missing
* `enrich_values` - a list of fields to decorate the document being enriched with.
                    Each entry holds a source field and a target field.
                    The source field indicates what decorate field to use that is available in the policy.
                    The target field controls the field name to use in the document being enriched.
                    The source and target fields can be the same.

Example pipeline config:

```
{
   "processors": [
      {
         "policy_name": "my_policy",
         "key": "host_name",
         "values": [
            {
              "source": "globalRank",
              "target": "global_rank"
            }
         ]
      }
   ]
}
```

In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the `globalRank`
field in the enrich index and saved as `global_rank` in the document being enriched.

The enrich source field mapper is an internal field mapper meant to be
used by enrich exclusively.

Relates to elastic#32789
@martijnvg martijnvg added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Apr 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/1

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Apr 25, 2019
The enrich processor performs a lookup in a locally allocated
enrich index shard using a field value from the document being enriched.
If there is a match then the _source of the enrich document is fetched.
The document being enriched then gets the decorate values from the
enrich document based on the configured decorate fields in the pipeline.

Note that the usage of the _source field is temporary until the enrich
source field that is part of elastic#41521 is merged into the enrich branch.
Using the _source field involves significant decompression which not
desired for enrich use cases.

The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.

The enrich processor has the following configuration options:
* `policy_name` - the name of the policy this processor should use
* `enrich_key` - the field in the document being enriched that holds to lookup value
* `enrich_key_ignore_missing` - Whether to allow the key field to be missing
* `enrich_values` - a list of fields to decorate the document being enriched with.
                    Each entry holds a source field and a target field.
                    The source field indicates what decorate field to use that is available in the policy.
                    The target field controls the field name to use in the document being enriched.
                    The source and target fields can be the same.

Example pipeline config:

```
{
   "processors": [
      {
         "policy_name": "my_policy",
         "enrich_key": "host_name",
         "enrich_values": [
            {
              "source": "globalRank",
              "target": "global_rank"
            }
         ]
      }
   ]
}
```

In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the `globalRank`
field in the enrich index and saved as `global_rank` in the document being enriched.

This is PR is part one of elastic#41521
@martijnvg martijnvg mentioned this pull request Apr 25, 2019
martijnvg added a commit that referenced this pull request Apr 30, 2019
The enrich processor performs a lookup in a locally allocated
enrich index shard using a field value from the document being enriched.
If there is a match then the _source of the enrich document is fetched.
The document being enriched then gets the decorate values from the
enrich document based on the configured decorate fields in the pipeline.

Note that the usage of the _source field is temporary until the enrich
source field that is part of #41521 is merged into the enrich branch.
Using the _source field involves significant decompression which not
desired for enrich use cases.

The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.

The enrich processor has the following configuration options:
* `policy_name` - the name of the policy this processor should use
* `enrich_key` - the field in the document being enriched that holds to lookup value
* `ignore_missing` - Whether to allow the key field to be missing
* `enrich_values` - a list of fields to decorate the document being enriched with.
                    Each entry holds a source field and a target field.
                    The source field indicates what decorate field to use that is available in the policy.
                    The target field controls the field name to use in the document being enriched.
                    The source and target fields can be the same.

Example pipeline config:

```
{
   "processors": [
      {
         "policy_name": "my_policy",
         "enrich_key": "host_name",
         "enrich_values": [
            {
              "source": "globalRank",
              "target": "global_rank"
            }
         ]
      }
   ]
}
```

In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the `globalRank`
field in the enrich index and saved as `global_rank` in the document being enriched.

This is PR is part one of #41521
martijnvg added a commit that referenced this pull request Apr 30, 2019
The enrich processor performs a lookup in a locally allocated
enrich index shard using a field value from the document being enriched.
If there is a match then the _source of the enrich document is fetched.
The document being enriched then gets the decorate values from the
enrich document based on the configured decorate fields in the pipeline.

Note that the usage of the _source field is temporary until the enrich
source field that is part of #41521 is merged into the enrich branch.
Using the _source field involves significant decompression which not
desired for enrich use cases.

The policy contains the information what field in the enrich index
to query and what fields are available to decorate a document being
enriched with.

The enrich processor has the following configuration options:
* `policy_name` - the name of the policy this processor should use
* `enrich_key` - the field in the document being enriched that holds to lookup value
* `ignore_missing` - Whether to allow the key field to be missing
* `enrich_values` - a list of fields to decorate the document being enriched with.
                    Each entry holds a source field and a target field.
                    The source field indicates what decorate field to use that is available in the policy.
                    The target field controls the field name to use in the document being enriched.
                    The source and target fields can be the same.

Example pipeline config:

```
{
   "processors": [
      {
         "policy_name": "my_policy",
         "enrich_key": "host_name",
         "enrich_values": [
            {
              "source": "globalRank",
              "target": "global_rank"
            }
         ]
      }
   ]
}
```

In the above example documents are being enriched with a global rank value.
For each document that has match in the enrich index based on its host_name field,
the document gets an global rank field value, which is fetched from the `globalRank`
field in the enrich index and saved as `global_rank` in the document being enriched.

This is PR is part one of #41521
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 23, 2019
The enrich source field mapper stores the source of a document as
binary doc values. This is useful in cases where retrieval speeds
are more important than compact storage (which is what SourceFieldMapper does),
which is the case for the enrich processor.

Prior to this change enrich processor was using _source stored field
to fetch the enrich document to enrich document being ingested.

The enrich policy runner, when creating the enrich index, disables
_source meta field and enables the _enrich_source meta field.

The enrich source field mapper is an internal field, which is only
meant to be used by the enrich feature.

Relates to elastic#41521 and elastic#32789
@martijnvg
Copy link
Member Author

Superseded by #42423 and #41532

@martijnvg martijnvg closed this May 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants