OpenSearch/docs/reference/mapping/params/eager-global-ordinals.asciidoc

88 lines
3.6 KiB
Plaintext

[[eager-global-ordinals]]
=== `eager_global_ordinals`
Global ordinals is a data-structure on top of doc values, that maintains an
incremental numbering for each unique term in a lexicographic order. Each
term has a unique number and the number of term 'A' is lower than the
number of term 'B'. Global ordinals are only supported with
<<keyword,`keyword`>> and <<text,`text`>> fields. In `keyword` fields, they
are available by default but `text` fields can only use them when `fielddata`,
with all of its associated baggage, is enabled.
Doc values (and fielddata) also have ordinals, which is a unique numbering for
all terms in a particular segment and field. Global ordinals just build on top
of this, by providing a mapping between the segment ordinals and the global
ordinals, the latter being unique across the entire shard. Given that global
ordinals for a specific field are tied to _all the segments of a shard_, they
need to be entirely rebuilt whenever a once new segment becomes visible.
Global ordinals are used for features that use segment ordinals, such as
the <<search-aggregations-bucket-terms-aggregation,`terms` aggregation>>,
to improve the execution time. A terms aggregation relies purely on global
ordinals to perform the aggregation at the shard level, then converts global
ordinals to the real term only for the final reduce phase, which combines
results from different shards.
The loading time of global ordinals depends on the number of terms in a field,
but in general it is low, since it source field data has already been loaded.
The memory overhead of global ordinals is a small because it is very
efficiently compressed.
By default, global ordinals are loaded at search-time, which is the right
trade-off if you are optimizing for indexing speed. However, if you are more
interested in search speed, it could be beneficial to set
`eager_global_ordinals: true` on fields that you plan to use in terms
aggregations:
[source,console]
------------
PUT my_index/_mapping
{
"properties": {
"tags": {
"type": "keyword",
"eager_global_ordinals": true
}
}
}
------------
// TEST[s/^/PUT my_index\n/]
This will shift the cost of building the global ordinals from search-time to
refresh-time. Elasticsearch will make sure that global ordinals are built
before exposing to searches any changes to the content of the index.
Elasticsearch will also eagerly build global ordinals when starting a new copy
of a shard, such as when increasing the number of replicas or when relocating a
shard onto a new node.
If a shard has been <<indices-forcemerge,force-merged>> down to a single
segment then its global ordinals are identical to the ordinals for its unique
segment, which means there is no extra cost for using global ordinals on such a
shard. Note that for performance reasons you should only force-merge an index
to which you will never write again.
On a <<frozen-indices,frozen index>>, global ordinals are discarded after each
search and rebuilt again on the next search if needed or if
`eager_global_ordinals` is set. This means `eager_global_ordinals` should not
be used on frozen indices. Instead, force-merge an index to a single segment
before freezing it so that global ordinals need not be built separately on each
search.
If you ever decide that you do not need to run `terms` aggregations on this
field anymore, then you can disable eager loading of global ordinals at any
time:
[source,console]
------------
PUT my_index/_mapping
{
"properties": {
"tags": {
"type": "keyword",
"eager_global_ordinals": false
}
}
}
------------
// TEST[continued]