118 lines
5.2 KiB
Plaintext
118 lines
5.2 KiB
Plaintext
[[eager-global-ordinals]]
|
|
=== `eager_global_ordinals`
|
|
|
|
==== What are global ordinals?
|
|
|
|
To support aggregations and other operations that require looking up field
|
|
values on a per-document basis, Elasticsearch uses a data structure called
|
|
<<doc-values, doc values>>. Term-based field types such as `keyword` store
|
|
their doc values using an ordinal mapping for a more compact representation.
|
|
This mapping works by assigning each term an incremental integer or 'ordinal'
|
|
based on its lexicographic order. The field's doc values store only the
|
|
ordinals for each document instead of the original terms, with a separate
|
|
lookup structure to convert between ordinals and terms.
|
|
|
|
When used during aggregations, ordinals can greatly improve performance. As an
|
|
example, the `terms` aggregation relies only on ordinals to collect documents
|
|
into buckets at the shard-level, then converts the ordinals back to their
|
|
original term values when combining results across shards.
|
|
|
|
Each index segment defines its own ordinal mapping, but aggregations collect
|
|
data across an entire shard. So to be able to use ordinals for shard-level
|
|
operations like aggregations, Elasticsearch creates a unified mapping called
|
|
'global ordinals'. The global ordinal mapping is built on top of segment
|
|
ordinals, and works by maintaining a map from global ordinal to the local
|
|
ordinal for each segment.
|
|
|
|
Global ordinals are used if a search contains any of the following components:
|
|
|
|
* Certain bucket aggregations on `keyword`, `ip`, and `flattened` fields. This
|
|
includes `terms` aggregations as mentioned above, as well as `composite`,
|
|
`diversified_sampler`, and `significant_terms`.
|
|
* Bucket aggregations on `text` fields that require <<fielddata, `fielddata`>>
|
|
to be enabled.
|
|
* Operations on parent and child documents from a `join` field, including
|
|
`has_child` queries and `parent` aggregations.
|
|
|
|
NOTE: The global ordinal mapping is an on-heap data structure. When measuring
|
|
memory usage, Elasticsearch counts the memory from global ordinals as
|
|
'fielddata'. Global ordinals memory is included in the
|
|
<<fielddata-circuit-breaker, fielddata circuit breaker>>, and is returned
|
|
under `fielddata` in the <<cluster-nodes-stats, node stats>> response.
|
|
|
|
==== Loading global ordinals
|
|
|
|
The global ordinal mapping must be built before ordinals can be used during a
|
|
search. By default, the mapping is loaded during search on the first time that
|
|
global ordinals are needed. This is is the right approach if you are optimizing
|
|
for indexing speed, but if search performance is a priority, it's recommended
|
|
to eagerly load global ordinals eagerly on fields that will be used in
|
|
aggregations:
|
|
|
|
[source,console]
|
|
------------
|
|
PUT my-index-000001/_mapping
|
|
{
|
|
"properties": {
|
|
"tags": {
|
|
"type": "keyword",
|
|
"eager_global_ordinals": true
|
|
}
|
|
}
|
|
}
|
|
------------
|
|
// TEST[s/^/PUT my-index-000001\n/]
|
|
|
|
When `eager_global_ordinals` is enabled, global ordinals are built when a shard
|
|
is <<indices-refresh, refreshed>> -- Elasticsearch always loads them before
|
|
exposing changes to the content of the index. This shifts the cost of building
|
|
global ordinals from search to index-time. Elasticsearch will also eagerly
|
|
build global ordinals when creating a new copy of a shard, as can occur when
|
|
increasing the number of replicas or relocating a shard onto a new node.
|
|
|
|
Eager loading can be disabled at any time by updating the `eager_global_ordinals` setting:
|
|
|
|
[source,console]
|
|
------------
|
|
PUT my-index-000001/_mapping
|
|
{
|
|
"properties": {
|
|
"tags": {
|
|
"type": "keyword",
|
|
"eager_global_ordinals": false
|
|
}
|
|
}
|
|
}
|
|
------------
|
|
// TEST[continued]
|
|
|
|
IMPORTANT: On a <<frozen-indices,frozen index>>, global ordinals are discarded
|
|
after each search and rebuilt again when they're requested. This means that
|
|
`eager_global_ordinals` should not be used on frozen indices: it would
|
|
cause global ordinals to be reloaded on every search. Instead, the index should
|
|
be force-merged to a single segment before being frozen. This avoids building
|
|
global ordinals altogether (more details can be found in the next section).
|
|
|
|
==== Avoiding global ordinal loading
|
|
|
|
Usually, global ordinals do not present a large overhead in terms of their
|
|
loading time and memory usage. However, loading global ordinals can be
|
|
expensive on indices with large shards, or if the fields contain a large
|
|
number of unique term values. Because global ordinals provide a unified mapping
|
|
for all segments on the shard, they also need to be rebuilt entirely when a new
|
|
segment becomes visible.
|
|
|
|
In some cases it is possible to avoid global ordinal loading altogether:
|
|
|
|
* The `terms`, `sampler`, and `significant_terms` aggregations support a
|
|
parameter
|
|
<<search-aggregations-bucket-terms-aggregation-execution-hint, `execution_hint`>>
|
|
that helps control how buckets are collected. It defaults to `global_ordinals`,
|
|
but can be set to `map` to instead use the term values directly.
|
|
* If a shard has been <<indices-forcemerge,force-merged>> down to a single
|
|
segment, then its segment ordinals are already 'global' to the shard. In this
|
|
case, Elasticsearch does not need to build a global ordinal mapping and there
|
|
is no additional overhead from using global ordinals. Note that for performance
|
|
reasons you should only force-merge an index to which you will never write to
|
|
again.
|