75 lines
2.6 KiB
Plaintext
75 lines
2.6 KiB
Plaintext
[[eager-global-ordinals]]
|
|
=== `eager_global_ordinals`
|
|
|
|
Global ordinals is a data-structure on top of doc values, that maintains an
|
|
incremental numbering for each unique term in a lexicographic order. Each
|
|
term has a unique number and the number of term 'A' is lower than the
|
|
number of term 'B'. Global ordinals are only supported with
|
|
<<keyword,`keyword`>> and <<text,`text`>> fields. In `keyword` fields, they
|
|
are available by default but `text` fields can only use them when `fielddata`,
|
|
with all of its associated baggage, is enabled.
|
|
|
|
Doc values (and fielddata) also have ordinals, which is a unique numbering for
|
|
all terms in a particular segment and field. Global ordinals just build on top
|
|
of this, by providing a mapping between the segment ordinals and the global
|
|
ordinals, the latter being unique across the entire shard. Given that global
|
|
ordinals for a specific field are tied to _all the segments of a shard_, they
|
|
need to be entirely rebuilt whenever a once new segment becomes visible.
|
|
|
|
Global ordinals are used for features that use segment ordinals, such as
|
|
the <<search-aggregations-bucket-terms-aggregation,`terms` aggregation>>,
|
|
to improve the execution time. A terms aggregation relies purely on global
|
|
ordinals to perform the aggregation at the shard level, then converts global
|
|
ordinals to the real term only for the final reduce phase, which combines
|
|
results from different shards.
|
|
|
|
The loading time of global ordinals depends on the number of terms in a field,
|
|
but in general it is low, since it source field data has already been loaded.
|
|
The memory overhead of global ordinals is a small because it is very
|
|
efficiently compressed.
|
|
|
|
By default, global ordinals are loaded at search-time, which is the right
|
|
trade-off if you are optimizing for indexing speed. However, if you are more
|
|
interested in search speed, it could be interesting to set
|
|
`eager_global_ordinals: true` on fields that you plan to use in terms
|
|
aggregations:
|
|
|
|
[source,js]
|
|
------------
|
|
PUT my_index/_mapping/my_type
|
|
{
|
|
"properties": {
|
|
"tags": {
|
|
"type": "keyword",
|
|
"eager_global_ordinals": true
|
|
}
|
|
}
|
|
}
|
|
------------
|
|
// CONSOLE
|
|
// TEST[s/^/PUT my_index\n/]
|
|
|
|
This will shift the cost from search-time to refresh-time. Elasticsearch will
|
|
make sure that global ordinals are built before publishing updates to the
|
|
content of the index.
|
|
|
|
If you ever decide that you do not need to run `terms` aggregations on this
|
|
field anymore, then you can disable eager loading of global ordinals at any
|
|
time:
|
|
|
|
[source,js]
|
|
------------
|
|
PUT my_index/_mapping/my_type
|
|
{
|
|
"properties": {
|
|
"tags": {
|
|
"type": "keyword",
|
|
"eager_global_ordinals": false
|
|
}
|
|
}
|
|
}
|
|
------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|