diff --git a/docs/reference/mapping/fields/parent-field.asciidoc b/docs/reference/mapping/fields/parent-field.asciidoc index 7edaa3949d9..affb962df47 100644 --- a/docs/reference/mapping/fields/parent-field.asciidoc +++ b/docs/reference/mapping/fields/parent-field.asciidoc @@ -119,7 +119,7 @@ GET my_index/_search ==== Global ordinals -Parent-child uses <> to speed up joins. +Parent-child uses <> to speed up joins. Global ordinals need to be rebuilt after any change to a shard. The more parent id values are stored in a shard, the longer it takes to rebuild the global ordinals for the `_parent` field. diff --git a/docs/reference/mapping/params.asciidoc b/docs/reference/mapping/params.asciidoc index 716b3a9bc1f..e33fac6de5d 100644 --- a/docs/reference/mapping/params.asciidoc +++ b/docs/reference/mapping/params.asciidoc @@ -16,6 +16,7 @@ The following mapping parameters are common to some or all field datatypes: * <> * <> * <> +* <> * <> * <> * <> @@ -48,6 +49,8 @@ include::params/dynamic.asciidoc[] include::params/enabled.asciidoc[] +include::params/eager-global-ordinals.asciidoc[] + include::params/fielddata.asciidoc[] include::params/format.asciidoc[] diff --git a/docs/reference/mapping/params/eager-global-ordinals.asciidoc b/docs/reference/mapping/params/eager-global-ordinals.asciidoc new file mode 100644 index 00000000000..7e668621cd9 --- /dev/null +++ b/docs/reference/mapping/params/eager-global-ordinals.asciidoc @@ -0,0 +1,74 @@ +[[eager-global-ordinals]] +=== `eager_global_ordinals` + +Global ordinals is a data-structure on top of doc values, that maintains an +incremental numbering for each unique term in a lexicographic order. Each +term has a unique number and the number of term 'A' is lower than the +number of term 'B'. Global ordinals are only supported with +<> and <> fields. In `keyword` fields, they +are available by default but `text` fields can only use them when `fielddata`, +with all of its associated baggage, is enabled. + +Doc values (and fielddata) also have ordinals, which is a unique numbering for +all terms in a particular segment and field. Global ordinals just build on top +of this, by providing a mapping between the segment ordinals and the global +ordinals, the latter being unique across the entire shard. Given that global +ordinals for a specific field are tied to _all the segments of a shard_, they +need to be entirely rebuilt whenever a once new segment becomes visible. + +Global ordinals are used for features that use segment ordinals, such as +the <>, +to improve the execution time. A terms aggregation relies purely on global +ordinals to perform the aggregation at the shard level, then converts global +ordinals to the real term only for the final reduce phase, which combines +results from different shards. + +The loading time of global ordinals depends on the number of terms in a field, +but in general it is low, since it source field data has already been loaded. +The memory overhead of global ordinals is a small because it is very +efficiently compressed. + +By default, global ordinals are loaded at search-time, which is the right +trade-off if you are optimizing for indexing speed. However, if you are more +interested in search speed, it could be interesting to set +`eager_global_ordinals: true` on fields that you plan to use in terms +aggregations: + +[source,js] +------------ +PUT my_index/_mapping/my_type +{ + "properties": { + "tags": { + "type": "keyword", + "eager_global_ordinals": true + } + } +} +------------ +// CONSOLE +// TEST[s/^/PUT my_index\n/] + +This will shift the cost from search-time to refresh-time. Elasticsearch will +make sure that global ordinals are built before publishing updates to the +content of the index. + +If you ever decide that you do not need to run `terms` aggregations on this +field anymore, then you can disable eager loading of global ordinals at any +time: + +[source,js] +------------ +PUT my_index/_mapping/my_type +{ + "properties": { + "tags": { + "type": "keyword", + "eager_global_ordinals": false + } + } +} +------------ +// CONSOLE +// TEST[continued] + diff --git a/docs/reference/mapping/params/fielddata.asciidoc b/docs/reference/mapping/params/fielddata.asciidoc index baca3c426d6..0ba05fdf396 100644 --- a/docs/reference/mapping/params/fielddata.asciidoc +++ b/docs/reference/mapping/params/fielddata.asciidoc @@ -105,40 +105,6 @@ same name in the same index. Its value can be updated on existing fields using the <>. -[[global-ordinals]] -.Global ordinals -***************************************** - -Global ordinals is a data-structure on top of fielddata and doc values, that -maintains an incremental numbering for each unique term in a lexicographic -order. Each term has a unique number and the number of term 'A' is lower than -the number of term 'B'. Global ordinals are only supported on <> -and <> fields. - -Fielddata and doc values also have ordinals, which is a unique numbering for -all terms in a particular segment and field. Global ordinals just build on top -of this, by providing a mapping between the segment ordinals and the global -ordinals, the latter being unique across the entire shard. - -Global ordinals are used for features that use segment ordinals, such as -sorting and the terms aggregation, to improve the execution time. A terms -aggregation relies purely on global ordinals to perform the aggregation at the -shard level, then converts global ordinals to the real term only for the final -reduce phase, which combines results from different shards. - -Global ordinals for a specified field are tied to _all the segments of a -shard_, while fielddata and doc values ordinals are tied to a single segment. -which is different than for field data for a specific field which is tied to a -single segment. For this reason global ordinals need to be entirely rebuilt -whenever a once new segment becomes visible. - -The loading time of global ordinals depends on the number of terms in a field, -but in general it is low, since it source field data has already been loaded. -The memory overhead of global ordinals is a small because it is very -efficiently compressed. - -***************************************** - [[field-data-filtering]] ==== `fielddata_frequency_filter` diff --git a/docs/reference/mapping/types/keyword.asciidoc b/docs/reference/mapping/types/keyword.asciidoc index e560f8ae1d0..7f695654529 100644 --- a/docs/reference/mapping/types/keyword.asciidoc +++ b/docs/reference/mapping/types/keyword.asciidoc @@ -48,7 +48,7 @@ The following parameters are accepted by `keyword` fields: can later be used for sorting, aggregations, or scripting? Accepts `true` (default) or `false`. -<>:: +<>:: Should global ordinals be loaded eagerly on refresh? Accepts `true` or `false` (default). Enabling this is a good idea on fields that are frequently used for diff --git a/docs/reference/mapping/types/text.asciidoc b/docs/reference/mapping/types/text.asciidoc index 0bb9b00a102..f1c980ce092 100644 --- a/docs/reference/mapping/types/text.asciidoc +++ b/docs/reference/mapping/types/text.asciidoc @@ -57,7 +57,7 @@ The following parameters are accepted by `text` fields: Mapping field-level query time boosting. Accepts a floating point number, defaults to `1.0`. -<>:: +<>:: Should global ordinals be loaded eagerly on refresh? Accepts `true` or `false` (default). Enabling this is a good idea on fields that are frequently used for