elastic · jtibshirani · Nov 20, 2020 · Oct 29, 2020 · Oct 29, 2020 · Nov 2, 2020
diff --git a/docs/plugins/mapper-annotated-text.asciidoc b/docs/plugins/mapper-annotated-text.asciidoc
@@ -18,7 +18,7 @@ include::install_remove.asciidoc[]
 [[mapper-annotated-text-usage]]
 ==== Using the `annotated-text` field
 
-The `annotated-text` tokenizes text content as per the more common `text` field (see 
+The `annotated-text` tokenizes text content as per the more common {ref}/text.html[`text`] field (see 
 "limitations" below) but also injects any marked-up annotation tokens directly into
 the search index:
 

diff --git a/docs/reference/aggregations/bucket/diversified-sampler-aggregation.asciidoc b/docs/reference/aggregations/bucket/diversified-sampler-aggregation.asciidoc
@@ -181,7 +181,7 @@ Each option will hold up to `shard_size` values in memory while performing de-du
  - hold ordinals of the field as determined by the Lucene index (`global_ordinals`)
  - hold hashes of the field values - with potential for hash collisions (`bytes_hash`)
 
-The default setting is to use `global_ordinals` if this information is available from the Lucene index and reverting to `map` if not.
+The default setting is to use <<eager-global-ordinals,`global_ordinals`>> if this information is available from the Lucene index and reverting to `map` if not.
 The `bytes_hash` setting may prove faster in some cases but introduces the possibility of false positives in de-duplication logic due to the possibility of hash collisions.
 Please note that Elasticsearch will ignore the choice of execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.
 

diff --git a/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc b/docs/reference/aggregations/bucket/significantterms-aggregation.asciidoc
@@ -553,7 +553,7 @@ A description of the different collection modes can be found in the
 There are different mechanisms by which terms aggregations can be executed:
 
  - by using field values directly in order to aggregate data per-bucket (`map`)
- - by using global ordinals of the field and allocating one bucket per global ordinal (`global_ordinals`)
+ - by using <<eager-global-ordinals,global ordinals>> of the field and allocating one bucket per global ordinal (`global_ordinals`)
 
 Elasticsearch tries to have sensible defaults so this is something that generally doesn't need to be configured.
 

diff --git a/docs/reference/cat/fielddata.asciidoc b/docs/reference/cat/fielddata.asciidoc
@@ -4,8 +4,8 @@
 <titleabbrev>cat fielddata</titleabbrev>
 ++++
 
-Returns the amount of heap memory currently used by fielddata on every data node
-in the cluster.
+Returns the amount of heap memory currently used by the
+<<modules-fielddata, field data cache>> on every data node in the cluster.
 
 
 [[cat-fielddata-api-request]]

diff --git a/docs/reference/cluster/stats.asciidoc b/docs/reference/cluster/stats.asciidoc
@@ -246,7 +246,7 @@ activities.
 
 `fielddata`::
 (object)
-Contains statistics about the field data cache of selected nodes.
+Contains statistics about the <<modules-fielddata, field data cache>> of selected nodes.
 +
 .Properties of `fielddata`
 [%collapsible%open]

diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc
@@ -303,13 +303,14 @@ may become much worse.
 [discrete]
 === Warm up global ordinals
 
-Global ordinals are a data-structure that is used in order to run
-<<search-aggregations-bucket-terms-aggregation,`terms`>> aggregations on
-<<keyword,`keyword`>> fields. They are loaded lazily in memory because
-Elasticsearch does not know which fields will be used in `terms` aggregations
-and which fields won't. You can tell Elasticsearch to load global ordinals
-eagerly when starting or refreshing a shard by configuring mappings as
-described below:
+<<eager-global-ordinals,Global ordinals>> are a data structure that is used to
+optimize the performance of aggregations. They are calculated lazily and stored in
+the JVM heap as part of the <<modules-fielddata, field data cache>>. For fields
+that are heavily used for bucketing aggregations, you can tell {es} to construct
+and cache the global ordinals before requests are received. This should be done
+carefully because it will increase heap usage and can make <<indices-refresh, refreshes>>
+take longer. The option can be updated dynamically on an existing mapping by
+setting the <<eager-global-ordinals, eager global ordinals>> mapping parameter:
 
 [source,console]
 --------------------------------------------------
@@ -392,19 +393,19 @@ right number of replicas for you is
 
 === Tune your queries with the Profile API
 
-You can also analyse how expensive each component of your queries and 
-aggregations are using the {ref}/search-profile.html[Profile API]. This might 
-allow you to tune your queries to be less expensive, resulting in a positive 
-performance result and reduced load. Also note that Profile API payloads can be 
-easily visualised for better readability in the 
-{kibana-ref}/xpack-profiler.html[Search Profiler], which is a Kibana dev tools 
+You can also analyse how expensive each component of your queries and
+aggregations are using the {ref}/search-profile.html[Profile API]. This might
+allow you to tune your queries to be less expensive, resulting in a positive
+performance result and reduced load. Also note that Profile API payloads can be
+easily visualised for better readability in the
+{kibana-ref}/xpack-profiler.html[Search Profiler], which is a Kibana dev tools
 UI available in all X-Pack licenses, including the free X-Pack Basic license.
 
 Some caveats to the Profile API are that:
 
  - the Profile API as a debugging tool adds significant overhead to search execution and can also have a very verbose output
  - given the added overhead, the resulting took times are not reliable indicators of actual took time, but can be used comparatively between clauses for relative timing differences
- - the Profile API is best for exploring possible reasons behind the most costly clauses of a query but isn't intended for accurately measuring absolute timings of each clause 
+ - the Profile API is best for exploring possible reasons behind the most costly clauses of a query but isn't intended for accurately measuring absolute timings of each clause
 
 [[faster-phrase-queries]]
 === Faster phrase queries with `index_phrases`

diff --git a/docs/reference/mapping/fields/id-field.asciidoc b/docs/reference/mapping/fields/id-field.asciidoc
@@ -3,10 +3,12 @@
 
 Each document has an `_id` that uniquely identifies it, which is indexed
 so that documents can be looked up either with the <<docs-get,GET API>> or the
-<<query-dsl-ids-query,`ids` query>>.
+<<query-dsl-ids-query,`ids` query>>. The `_id` can either be assigned at
+indexing time, or a unique `_id` can be generated by {es}. This field is not
+configurable in the mappings.
 
-The value of the `_id` field is accessible in certain queries (`term`,
-`terms`, `match`, `query_string`, `simple_query_string`).
+The value of the `_id` field is accessible in queries such as `term`,
+`terms`, `match`, and `query_string`.
 
 [source,console]
 --------------------------
@@ -33,12 +35,10 @@ GET my-index-000001/_search
 
 <1> Querying on the `_id` field (also see the <<query-dsl-ids-query,`ids` query>>)
 
-The value of the `_id` field is also accessible in aggregations or for sorting,
-but doing so is discouraged as it requires to load a lot of data in memory. In
-case sorting or aggregating on the `_id` field is required, it is advised to
-duplicate the content of the `_id` field in another field that has `doc_values`
-enabled.
-
+The `_id` field is restricted from use in aggregations, sorting, and scripting.
+In case sorting or aggregating on the `_id` field is required, it is advised to
+duplicate the content of the `_id` field into another field that has
+`doc_values` enabled.
 
 [NOTE]
 ==================================================

diff --git a/docs/reference/mapping/params.asciidoc b/docs/reference/mapping/params.asciidoc
@@ -49,8 +49,6 @@ include::params/eager-global-ordinals.asciidoc[]
 
 include::params/enabled.asciidoc[]
 
-include::params/fielddata.asciidoc[]
-
 include::params/format.asciidoc[]
 
 include::params/ignore-above.asciidoc[]

diff --git a/docs/reference/mapping/params/eager-global-ordinals.asciidoc b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
@@ -34,11 +34,10 @@ to be enabled.
 * Operations on parent and child documents from a `join` field, including
 `has_child` queries and `parent` aggregations.
 
-NOTE: The global ordinal mapping is an on-heap data structure. When measuring
-memory usage, Elasticsearch counts the memory from global ordinals as
-'fielddata'. Global ordinals memory is included in the
-<<fielddata-circuit-breaker, fielddata circuit breaker>>, and is returned
-under `fielddata` in the <<cluster-nodes-stats, node stats>> response.
+NOTE: The global ordinal mapping uses heap memory as part of the
+<<modules-fielddata, field data cache>>. Aggregations on high cardinality fields
+can use a lot of memory and trigger the <<fielddata-circuit-breaker, field data
+circuit breaker>>.
 
 ==== Loading global ordinals
 

diff --git a/docs/reference/mapping/params/fielddata.asciidoc b/docs/reference/mapping/params/fielddata.asciidoc
diff --git a/docs/reference/mapping/types/parent-join.asciidoc b/docs/reference/mapping/types/parent-join.asciidoc
@@ -120,11 +120,11 @@ PUT my-index-000001/_doc/4?routing=1&refresh
 <2> `answer` is the name of the join for this document
 <3> The parent id of this child document
 
-==== Parent-join and performance.
+==== Parent-join and performance
 
 The join field shouldn't be used like joins in a relation database. In Elasticsearch the key to good performance
 is to de-normalize your data into documents. Each join field, `has_child` or `has_parent` query adds a
-significant tax to your query performance.
+significant tax to your query performance. It can also trigger <<eager-global-ordinals, global ordinals>> to be built.
 
 The only case where the join field makes sense is if your data contains a one-to-many relationship where
 one entity significantly outnumbers the other entity. An example of such case is a use case with products