-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bucket aggregations compute bucket doc_count values by incrementing the doc_count by 1 for every document collected in the bucket. When using summary fields (such as aggregate_metric_double) one field may represent more than one document. To provide this functionality we have implemented a new field mapper (named doc_count field mapper). This field is a positive integer representing the number of documents aggregated in a single summary field. Bucket aggregations will check if a field of type doc_count exists in a document and will take this value into consideration when computing doc counts.
- Loading branch information
Showing
22 changed files
with
786 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
[[mapping-doc-count-field]] | ||
=== `_doc_count` data type | ||
++++ | ||
<titleabbrev>_doc_count</titleabbrev> | ||
++++ | ||
|
||
Bucket aggregations always return a field named `doc_count` showing the number of documents that were aggregated and partitioned | ||
in each bucket. Computation of the value of `doc_count` is very simple. `doc_count` is incremented by 1 for every document collected | ||
in each bucket. | ||
|
||
While this simple approach is effective when computing aggregations over individual documents, it fails to accurately represent | ||
documents that store pre-aggregated data (such as `histogram` or `aggregate_metric_double` fields), because one summary field may | ||
represent multiple documents. | ||
|
||
To allow for correct computation of the number of documents when working with pre-aggregated data, we have introduced a | ||
metadata field type named `_doc_count`. `_doc_count` must always be a positive integer representing the number of documents | ||
aggregated in a single summary field. | ||
|
||
When field `_doc_count` is added to a document, all bucket aggregations will respect its value and increment the bucket `doc_count` | ||
by the value of the field. If a document does not contain any `_doc_count` field, `_doc_count = 1` is implied by default. | ||
|
||
[IMPORTANT] | ||
======== | ||
* A `_doc_count` field can only store a single positive integer per document. Nested arrays are not allowed. | ||
* If a document contains no `_doc_count` fields, aggregators will increment by 1, which is the default behavior. | ||
======== | ||
|
||
[[mapping-doc-count-field-example]] | ||
==== Example | ||
|
||
The following <<indices-create-index, create index>> API request creates a new index with the following field mappings: | ||
|
||
* `my_histogram`, a `histogram` field used to store percentile data | ||
* `my_text`, a `keyword` field used to store a title for the histogram | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
PUT my_index | ||
{ | ||
"mappings" : { | ||
"properties" : { | ||
"my_histogram" : { | ||
"type" : "histogram" | ||
}, | ||
"my_text" : { | ||
"type" : "keyword" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
The following <<docs-index_,index>> API requests store pre-aggregated data for | ||
two histograms: `histogram_1` and `histogram_2`. | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
PUT my_index/_doc/1 | ||
{ | ||
"my_text" : "histogram_1", | ||
"my_histogram" : { | ||
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], | ||
"counts" : [3, 7, 23, 12, 6] | ||
}, | ||
"_doc_count": 45 <1> | ||
} | ||
PUT my_index/_doc/2 | ||
{ | ||
"my_text" : "histogram_2", | ||
"my_histogram" : { | ||
"values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], | ||
"counts" : [8, 17, 8, 7, 6, 2] | ||
}, | ||
"_doc_count_": 62 <1> | ||
} | ||
-------------------------------------------------- | ||
<1> Field `_doc_count` must be a positive integer storing the number of documents aggregated to produce each histogram. | ||
|
||
If we run the following <<search-aggregations-bucket-terms-aggregation, terms aggregation>> on `my_index`: | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"aggs" : { | ||
"histogram_titles" : { | ||
"terms" : { "field" : "my_text" } | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
We will get the following response: | ||
|
||
[source,console-result] | ||
-------------------------------------------------- | ||
{ | ||
... | ||
"aggregations" : { | ||
"histogram_titles" : { | ||
"doc_count_error_upper_bound": 0, | ||
"sum_other_doc_count": 0, | ||
"buckets" : [ | ||
{ | ||
"key" : "histogram_2", | ||
"doc_count" : 62 | ||
}, | ||
{ | ||
"key" : "histogram_1", | ||
"doc_count" : 45 | ||
} | ||
] | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[skip:test not setup] |
150 changes: 150 additions & 0 deletions
150
...api-spec/src/main/resources/rest-api-spec/test/search.aggregation/370_doc_count_field.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
setup: | ||
- do: | ||
indices.create: | ||
index: test_1 | ||
body: | ||
settings: | ||
number_of_replicas: 0 | ||
mappings: | ||
properties: | ||
str: | ||
type: keyword | ||
number: | ||
type: integer | ||
|
||
- do: | ||
bulk: | ||
index: test_1 | ||
refresh: true | ||
body: | ||
- '{"index": {}}' | ||
- '{"_doc_count": 10, "str": "abc", "number" : 500, "unmapped": "abc" }' | ||
- '{"index": {}}' | ||
- '{"_doc_count": 5, "str": "xyz", "number" : 100, "unmapped": "xyz" }' | ||
- '{"index": {}}' | ||
- '{"_doc_count": 7, "str": "foo", "number" : 100, "unmapped": "foo" }' | ||
- '{"index": {}}' | ||
- '{"_doc_count": 1, "str": "foo", "number" : 200, "unmapped": "foo" }' | ||
- '{"index": {}}' | ||
- '{"str": "abc", "number" : 500, "unmapped": "abc" }' | ||
|
||
--- | ||
"Test numeric terms agg with doc_count": | ||
- skip: | ||
version: " - 7.99.99" | ||
reason: "Doc count fields are only implemented in 8.0" | ||
|
||
- do: | ||
search: | ||
rest_total_hits_as_int: true | ||
body: { "size" : 0, "aggs" : { "num_terms" : { "terms" : { "field" : "number" } } } } | ||
|
||
- match: { hits.total: 5 } | ||
- length: { aggregations.num_terms.buckets: 3 } | ||
- match: { aggregations.num_terms.buckets.0.key: 100 } | ||
- match: { aggregations.num_terms.buckets.0.doc_count: 12 } | ||
- match: { aggregations.num_terms.buckets.1.key: 500 } | ||
- match: { aggregations.num_terms.buckets.1.doc_count: 11 } | ||
- match: { aggregations.num_terms.buckets.2.key: 200 } | ||
- match: { aggregations.num_terms.buckets.2.doc_count: 1 } | ||
|
||
|
||
--- | ||
"Test keyword terms agg with doc_count": | ||
- skip: | ||
version: " - 7.99.99" | ||
reason: "Doc count fields are only implemented in 8.0" | ||
- do: | ||
search: | ||
rest_total_hits_as_int: true | ||
body: { "size" : 0, "aggs" : { "str_terms" : { "terms" : { "field" : "str" } } } } | ||
|
||
- match: { hits.total: 5 } | ||
- length: { aggregations.str_terms.buckets: 3 } | ||
- match: { aggregations.str_terms.buckets.0.key: "abc" } | ||
- match: { aggregations.str_terms.buckets.0.doc_count: 11 } | ||
- match: { aggregations.str_terms.buckets.1.key: "foo" } | ||
- match: { aggregations.str_terms.buckets.1.doc_count: 8 } | ||
- match: { aggregations.str_terms.buckets.2.key: "xyz" } | ||
- match: { aggregations.str_terms.buckets.2.doc_count: 5 } | ||
|
||
--- | ||
|
||
"Test unmapped string terms agg with doc_count": | ||
- skip: | ||
version: " - 7.99.99" | ||
reason: "Doc count fields are only implemented in 8.0" | ||
- do: | ||
bulk: | ||
index: test_2 | ||
refresh: true | ||
body: | ||
- '{"index": {}}' | ||
- '{"_doc_count": 10, "str": "abc" }' | ||
- '{"index": {}}' | ||
- '{"str": "abc" }' | ||
- do: | ||
search: | ||
index: test_2 | ||
rest_total_hits_as_int: true | ||
body: { "size" : 0, "aggs" : { "str_terms" : { "terms" : { "field" : "str.keyword" } } } } | ||
|
||
- match: { hits.total: 2 } | ||
- length: { aggregations.str_terms.buckets: 1 } | ||
- match: { aggregations.str_terms.buckets.0.key: "abc" } | ||
- match: { aggregations.str_terms.buckets.0.doc_count: 11 } | ||
|
||
--- | ||
"Test composite str_terms agg with doc_count": | ||
- skip: | ||
version: " - 7.99.99" | ||
reason: "Doc count fields are only implemented in 8.0" | ||
- do: | ||
search: | ||
rest_total_hits_as_int: true | ||
body: { "size" : 0, "aggs" : | ||
{ "composite_agg" : { "composite" : | ||
{ | ||
"sources": ["str_terms": { "terms": { "field": "str" } }] | ||
} | ||
} | ||
} | ||
} | ||
|
||
- match: { hits.total: 5 } | ||
- length: { aggregations.composite_agg.buckets: 3 } | ||
- match: { aggregations.composite_agg.buckets.0.key.str_terms: "abc" } | ||
- match: { aggregations.composite_agg.buckets.0.doc_count: 11 } | ||
- match: { aggregations.composite_agg.buckets.1.key.str_terms: "foo" } | ||
- match: { aggregations.composite_agg.buckets.1.doc_count: 8 } | ||
- match: { aggregations.composite_agg.buckets.2.key.str_terms: "xyz" } | ||
- match: { aggregations.composite_agg.buckets.2.doc_count: 5 } | ||
|
||
|
||
--- | ||
"Test composite num_terms agg with doc_count": | ||
- skip: | ||
version: " - 7.99.99" | ||
reason: "Doc count fields are only implemented in 8.0" | ||
- do: | ||
search: | ||
rest_total_hits_as_int: true | ||
body: { "size" : 0, "aggs" : | ||
{ "composite_agg" : | ||
{ "composite" : | ||
{ | ||
"sources": ["num_terms" : { "terms" : { "field" : "number" } }] | ||
} | ||
} | ||
} | ||
} | ||
|
||
- match: { hits.total: 5 } | ||
- length: { aggregations.composite_agg.buckets: 3 } | ||
- match: { aggregations.composite_agg.buckets.0.key.num_terms: 100 } | ||
- match: { aggregations.composite_agg.buckets.0.doc_count: 12 } | ||
- match: { aggregations.composite_agg.buckets.1.key.num_terms: 200 } | ||
- match: { aggregations.composite_agg.buckets.1.doc_count: 1 } | ||
- match: { aggregations.composite_agg.buckets.2.key.num_terms: 500 } | ||
- match: { aggregations.composite_agg.buckets.2.doc_count: 11 } | ||
|
Oops, something went wrong.