Docs: Update documentation about execution hints for the terms aggregation.

2014-07-21 11:55:22 +02:00 · 2014-07-21 11:55:22 +02:00 · abeefbddea
parent f3114fe774
commit abeefbddea
1 changed files with 27 additions and 10 deletions
--- a/docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc
+++ b/docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc
@ -395,15 +395,32 @@ this would typically be too costly in terms of RAM.

 ==== Execution hint

-added[1.2.0] The `global_ordinals` execution mode
+added[1.2.0] Added the `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality` execution modes

-There are three mechanisms by which terms aggregations can be executed: either by using field values directly in order to aggregate
-data per-bucket (`map`), by using ordinals of the field values instead of the values themselves (`ordinals`) or by using global
-ordinals of the field (`global_ordinals`). The latter is faster, especially for fields with many unique
-values. However it can be slower if only a few documents match, when for example a terms aggregator is nested in another
-aggregator, this applies for both `ordinals` and `global_ordinals` execution modes. Elasticsearch tries to have sensible
-defaults when it comes to the execution mode that should be used, but  in case you know that one execution mode may
-perform better than the other one, you have the ability to "hint" it to Elasticsearch:
+deprecated[1.3.0] Removed the `ordinals` execution mode
+
+There are different mechanisms by which terms aggregations can be executed:
+
+ - by using field values directly in order to aggregate data per-bucket (`map`)
+ - by using ordinals of the field and preemptively allocating one bucket per ordinal value (`global_ordinals`)
+ - by using ordinals of the field and dynamically allocating one bucket per ordinal value (`global_ordinals_hash`)
+ - by using per-segment ordinals to compute counts and remap these counts to global counts using global ordinals (`global_ordinals_low_cardinality`)
+
+Elasticsearch tries to have sensible defaults so this is something that generally doesn't need to be configured.
+
+`map` should only be considered when very few documents match a query. Otherwise the ordinals-based execution modes
+are significantly faster. By default, `map` is only used when running an aggregation on scripts, since they don't have
+ordinals.
+
+`global_ordinals_low_cardinality` only works for leaf terms aggregations but is usually the fastest execution mode. Memory
+usage is linear with the number of unique values in the field, so it is only enabled by default on low-cardinality fields.
+
+`global_ordinals` is the second fastest option, but the fact that it preemptively allocates buckets can be memory-intensive,
+especially if you have one or more sub aggregations. It is used by default on top-level terms aggregations.
+
+`global_ordinals_hash` on the contrary to `global_ordinals` and `global_ordinals_low_cardinality` allocates buckets dynamically
+so memory usage is linear to the number of values of the documents that are part of the aggregation scope. It is used by default
+in inner aggregations.

 [source,js]
 --------------------------------------------------
@ -419,6 +436,6 @@ perform better than the other one, you have the ability to "hint" it to Elastics
 }
 --------------------------------------------------

-<1> the possible values are `map`, `ordinals` and `global_ordinals`
+<1> the possible values are `map`, `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality`

-Please note that Elasticsearch will ignore this execution hint if it is not applicable.
+Please note that Elasticsearch will ignore this execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.