88 lines
3.6 KiB
Plaintext
88 lines
3.6 KiB
Plaintext
[[eager-global-ordinals]]
|
|
=== `eager_global_ordinals`
|
|
|
|
Global ordinals is a data-structure on top of doc values, that maintains an
|
|
incremental numbering for each unique term in a lexicographic order. Each
|
|
term has a unique number and the number of term 'A' is lower than the
|
|
number of term 'B'. Global ordinals are only supported with
|
|
<<keyword,`keyword`>> and <<text,`text`>> fields. In `keyword` fields, they
|
|
are available by default but `text` fields can only use them when `fielddata`,
|
|
with all of its associated baggage, is enabled.
|
|
|
|
Doc values (and fielddata) also have ordinals, which is a unique numbering for
|
|
all terms in a particular segment and field. Global ordinals just build on top
|
|
of this, by providing a mapping between the segment ordinals and the global
|
|
ordinals, the latter being unique across the entire shard. Given that global
|
|
ordinals for a specific field are tied to _all the segments of a shard_, they
|
|
need to be entirely rebuilt whenever a once new segment becomes visible.
|
|
|
|
Global ordinals are used for features that use segment ordinals, such as
|
|
the <<search-aggregations-bucket-terms-aggregation,`terms` aggregation>>,
|
|
to improve the execution time. A terms aggregation relies purely on global
|
|
ordinals to perform the aggregation at the shard level, then converts global
|
|
ordinals to the real term only for the final reduce phase, which combines
|
|
results from different shards.
|
|
|
|
The loading time of global ordinals depends on the number of terms in a field,
|
|
but in general it is low, since it source field data has already been loaded.
|
|
The memory overhead of global ordinals is a small because it is very
|
|
efficiently compressed.
|
|
|
|
By default, global ordinals are loaded at search-time, which is the right
|
|
trade-off if you are optimizing for indexing speed. However, if you are more
|
|
interested in search speed, it could be beneficial to set
|
|
`eager_global_ordinals: true` on fields that you plan to use in terms
|
|
aggregations:
|
|
|
|
[source,console]
|
|
------------
|
|
PUT my_index/_mapping
|
|
{
|
|
"properties": {
|
|
"tags": {
|
|
"type": "keyword",
|
|
"eager_global_ordinals": true
|
|
}
|
|
}
|
|
}
|
|
------------
|
|
// TEST[s/^/PUT my_index\n/]
|
|
|
|
This will shift the cost of building the global ordinals from search-time to
|
|
refresh-time. Elasticsearch will make sure that global ordinals are built
|
|
before exposing to searches any changes to the content of the index.
|
|
Elasticsearch will also eagerly build global ordinals when starting a new copy
|
|
of a shard, such as when increasing the number of replicas or when relocating a
|
|
shard onto a new node.
|
|
|
|
If a shard has been <<indices-forcemerge,force-merged>> down to a single
|
|
segment then its global ordinals are identical to the ordinals for its unique
|
|
segment, which means there is no extra cost for using global ordinals on such a
|
|
shard. Note that for performance reasons you should only force-merge an index
|
|
to which you will never write again.
|
|
|
|
On a <<frozen-indices,frozen index>>, global ordinals are discarded after each
|
|
search and rebuilt again on the next search if needed or if
|
|
`eager_global_ordinals` is set. This means `eager_global_ordinals` should not
|
|
be used on frozen indices. Instead, force-merge an index to a single segment
|
|
before freezing it so that global ordinals need not be built separately on each
|
|
search.
|
|
|
|
If you ever decide that you do not need to run `terms` aggregations on this
|
|
field anymore, then you can disable eager loading of global ordinals at any
|
|
time:
|
|
|
|
[source,console]
|
|
------------
|
|
PUT my_index/_mapping
|
|
{
|
|
"properties": {
|
|
"tags": {
|
|
"type": "keyword",
|
|
"eager_global_ordinals": false
|
|
}
|
|
}
|
|
}
|
|
------------
|
|
// TEST[continued]
|