Add doc_count field mapper #64503

csoulios · 2020-11-02T20:06:29Z

Bucket aggregations compute bucket doc_count values by incrementing the doc_count by 1 for every document collected in the bucket.

When using summary fields (such as aggregate_metric_double) one field may represent more than one document. To provide this functionality we have implemented a new field mapper (named doc_count field mapper). This field is a positive integer representing the number of documents aggregated in a single summary field.

Bucket aggregations will check if a field of type doc_count exists in a document and will take this value into consideration when computing doc counts.

Note: This PR is the same as #58339 but has been opened against master branch. For the complete discussion on this PR check #58339

added tests Build fixes Added tests for doc_count fieldmapper doc count tests Resolve conflicts after merge from master Added yaml test for doc_count field type Minor changes to test Fix issue with not-registering field mapper Simplify terms agg test Add doc_count provider in the buckets aggregator Initialize doc_count provider once doc_count provider is initialized once in the buckets agg constructor. Added tests for FieldBasedDocCountProvider Added more tests to DocCountFieldMapper Updated branch to fix build after merge Added validation for single doc_count field Added validation to ensure only a single doc_count field exists in the mapping Added version skips to fix broken tests Added documentation for doc_count Changes to address review comments: - Minor doc change - Added yml test that merges template with multiple doc_count - Changed DocCountFieldType indexing to TextSearchInfo.NONE Use _doc_count as Lucene field for doc count Minor change: field rename Minor change to yml test. Fix errors from merge Converted _doc_count to metadata field type Throw an error if parsed value is not a number Make _doc_count field a metadata field Fixed broken tests Fix bug in low cardinality ordinal terms aggs Update docs that _doc_count is a metadata field Fix broken ML tests Fix errors after merge Addressed review comments Addressed reviewer comments Added DocCountFieldTypeTests Make composite agg respect _doc_count field Cleaned up/simplified DocCountProvider class DocCountProvider rethrows IOException instead of swallowing it Set familyTypeName of _doc_count to integer Revert changes to AggregatorTestCase Doc changes

elasticmachine · 2020-11-02T20:06:32Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

elasticmachine · 2020-11-02T20:06:32Z

Pinging @elastic/es-search (:Search/Mapping)

jimczi

LGTM!

csoulios · 2020-11-03T13:51:33Z

@elasticmachine run elasticsearch-ci/bwc

csoulios · 2020-11-03T15:05:27Z

@elasticmachine run elasticsearch-ci/bwc

This field is added from version 7.11 onwards. We are adding it to the list of ignored fields for data frame analytics in 7.10 to avoid failing to start an outlier detection job in a mixed cluster environment. Relates #64503

Because of elastic#64503. This adds to elastic#64540.

Because of #64503. This adds to #64540.

Relates elastic#64503.

Relates #64503.

A doc_count field mapper was added in elastic/elasticsearch#64503 This is currently failing the ES promotion. After this promotion, we can go back and update this test to be an exact match on the body if we think that is the desired assertion. Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

…82547) * [test] Updates rollup test to allow incoming doc_count field mapper A doc_count field mapper was added in elastic/elasticsearch#64503 This is currently failing the ES promotion. After this promotion, we can go back and update this test to be an exact match on the body if we think that is the desired assertion. Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

…82547) (#82554) * [test] Updates rollup test to allow incoming doc_count field mapper A doc_count field mapper was added in elastic/elasticsearch#64503 This is currently failing the ES promotion. After this promotion, we can go back and update this test to be an exact match on the body if we think that is the desired assertion. Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co> Co-authored-by: Tyler Smalley <tyler.smalley@elastic.co>

After merging _doc_count PR in 7.x (elastic#64594), we can enable the skipped ML dataframe analytics. Those tests had been broken after merging elastic#64503 in master

After merging _doc_count PR in 7.x (#64594), we can enable the skipped ML dataframe analytics. Those tests had been broken after merging #64503 in master

jpountz · 2020-11-25T13:44:17Z

server/src/main/java/org/elasticsearch/index/mapper/DocCountFieldMapper.java

+
+        long value = parser.longValue(false);
+        if (value <= 0) {
+            throw new IllegalArgumentException("Field [" + fieldType().name() + "] must be a positive integer.");


maybe add the value that was parsed to the error message? this would make issues easier to investigate

@jpountz thanks for the input. Now that this has been merged, I will push this change in a subsequent PR.

After merging _doc_count field type in v7.11.0 (#64594), this PR lowers the minimum compatibility version from v8.0.0 to v7.11.0 Relates to #64503

A while back, Lucene introduced the ability to index custom term frequencies, ie. giving users the ability to provide a numeric value that should be indexed as a term frequency rather than letting Lucene compute the term frequency by itself based on the number of occurrences of a term. This PR modifies the _doc_count field so that it is stored as Lucene custom term frequency. A benefit of moving to custom term frequencies is that Lucene will automatically compute global term statistics like totalTermFreq which will let us know the sum of the values of the _doc_count field across an entire shard. This could in-turn be useful to generalize optimizations to rollup indices, e.g. buckets aggregations where all documents fall into the same bucket. Relates to #64503

csoulios added :Analytics/Aggregations Aggregations :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 labels Nov 2, 2020

csoulios requested a review from jimczi November 2, 2020 20:06

elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team labels Nov 2, 2020

csoulios changed the title ~~Initial version of doc_count field mapper~~ Add doc_count field mapper Nov 2, 2020

csoulios mentioned this pull request Nov 2, 2020

Add doc_count field mapper #58339

Closed

csoulios added 3 commits November 2, 2020 22:35

Exclude _doc_count from SQL results that made tests fail

a46c8ea

Converted list to set, since contains() is used

29c39bb

Changed family type of doc_count from int to long

ad67789

jimczi approved these changes Nov 3, 2020

View reviewed changes

csoulios added 2 commits November 3, 2020 14:56

Added _doc_count to SQL IndexResolverTests

b69c5e9

Merge branch 'master' into doc_count-field-mapper_master

330d2b7

csoulios added the v7.11.0 label Nov 3, 2020

csoulios merged commit 4dc833f into elastic:master Nov 3, 2020

csoulios deleted the doc_count-field-mapper_master branch November 3, 2020 15:47

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Nov 3, 2020

[ML] Further mutes on DFA upgrade tests

4ac7b68

Because of elastic#64503. This adds to elastic#64540.

dimitris-athanasiou mentioned this pull request Nov 3, 2020

[ML] Further mutes on DFA upgrade tests #64545

Merged

dimitris-athanasiou added a commit that referenced this pull request Nov 3, 2020

[ML] Further mutes on DFA upgrade tests (#64545)

887c98e

Because of #64503. This adds to #64540.

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Nov 3, 2020

[ML] Actually fix DFA mixed cluster tests

6ffd24c

Relates elastic#64503.

dimitris-athanasiou mentioned this pull request Nov 3, 2020

[ML] Actually mute DFA mixed cluster tests #64551

Merged

dimitris-athanasiou added a commit that referenced this pull request Nov 3, 2020

[ML] Actually fix DFA mixed cluster tests (#64551)

607713f

Relates #64503.

tylersmalley mentioned this pull request Nov 4, 2020

[test] Updates rollup test to allow incoming doc_count field mapper elastic/kibana#82547

Merged

$@polyfractal$ polyfractal added the >new-field-mapper Added when a new field type / mapper is being introduced label Nov 4, 2020

csoulios mentioned this pull request Nov 4, 2020

[7.x] Add doc_count field mapper #64594

Merged

csoulios mentioned this pull request Nov 6, 2020

Enable ML dataframe analytics tests #64723

Merged

csoulios added a commit that referenced this pull request Nov 6, 2020

Enable ML dataframe analytics tests (#64723)

08b4c29

After merging _doc_count PR in 7.x (#64594), we can enable the skipped ML dataframe analytics. Those tests had been broken after merging #64503 in master

jpountz reviewed Nov 25, 2020

View reviewed changes

This was referenced Dec 2, 2020

Store _doc_count field as custom term frequency #65776

Merged

Lower minimum compatibility version for _doc_count field tests #65790

Merged

csoulios mentioned this pull request Dec 3, 2020

[7.x] Lower minimum compatibility version of _doc_count field tests #65805

Merged

csoulios mentioned this pull request Dec 3, 2020

[7.x] Store _doc_count field as custom term frequency #65825

Merged

pugnascotia added the >feature label Dec 16, 2020

stevejgordon mentioned this pull request Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add doc_count field mapper #64503

Add doc_count field mapper #64503

csoulios commented Nov 2, 2020 •

edited

Loading

elasticmachine commented Nov 2, 2020

elasticmachine commented Nov 2, 2020

jimczi left a comment

csoulios commented Nov 3, 2020

csoulios commented Nov 3, 2020

jpountz Nov 25, 2020

csoulios Dec 2, 2020

Add doc_count field mapper #64503

Add doc_count field mapper #64503

Conversation

csoulios commented Nov 2, 2020 • edited Loading

elasticmachine commented Nov 2, 2020

elasticmachine commented Nov 2, 2020

jimczi left a comment

Choose a reason for hiding this comment

csoulios commented Nov 3, 2020

csoulios commented Nov 3, 2020

jpountz Nov 25, 2020

Choose a reason for hiding this comment

csoulios Dec 2, 2020

Choose a reason for hiding this comment

csoulios commented Nov 2, 2020 •

edited

Loading