Docs: Update documentation about execution hints for the terms aggregation.

This commit is contained in:
Adrien Grand 2014-07-21 11:55:22 +02:00
parent f3114fe774
commit abeefbddea
1 changed files with 27 additions and 10 deletions

View File

@ -395,15 +395,32 @@ this would typically be too costly in terms of RAM.
==== Execution hint
added[1.2.0] The `global_ordinals` execution mode
added[1.2.0] Added the `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality` execution modes
There are three mechanisms by which terms aggregations can be executed: either by using field values directly in order to aggregate
data per-bucket (`map`), by using ordinals of the field values instead of the values themselves (`ordinals`) or by using global
ordinals of the field (`global_ordinals`). The latter is faster, especially for fields with many unique
values. However it can be slower if only a few documents match, when for example a terms aggregator is nested in another
aggregator, this applies for both `ordinals` and `global_ordinals` execution modes. Elasticsearch tries to have sensible
defaults when it comes to the execution mode that should be used, but in case you know that one execution mode may
perform better than the other one, you have the ability to "hint" it to Elasticsearch:
deprecated[1.3.0] Removed the `ordinals` execution mode
There are different mechanisms by which terms aggregations can be executed:
- by using field values directly in order to aggregate data per-bucket (`map`)
- by using ordinals of the field and preemptively allocating one bucket per ordinal value (`global_ordinals`)
- by using ordinals of the field and dynamically allocating one bucket per ordinal value (`global_ordinals_hash`)
- by using per-segment ordinals to compute counts and remap these counts to global counts using global ordinals (`global_ordinals_low_cardinality`)
Elasticsearch tries to have sensible defaults so this is something that generally doesn't need to be configured.
`map` should only be considered when very few documents match a query. Otherwise the ordinals-based execution modes
are significantly faster. By default, `map` is only used when running an aggregation on scripts, since they don't have
ordinals.
`global_ordinals_low_cardinality` only works for leaf terms aggregations but is usually the fastest execution mode. Memory
usage is linear with the number of unique values in the field, so it is only enabled by default on low-cardinality fields.
`global_ordinals` is the second fastest option, but the fact that it preemptively allocates buckets can be memory-intensive,
especially if you have one or more sub aggregations. It is used by default on top-level terms aggregations.
`global_ordinals_hash` on the contrary to `global_ordinals` and `global_ordinals_low_cardinality` allocates buckets dynamically
so memory usage is linear to the number of values of the documents that are part of the aggregation scope. It is used by default
in inner aggregations.
[source,js]
--------------------------------------------------
@ -419,6 +436,6 @@ perform better than the other one, you have the ability to "hint" it to Elastics
}
--------------------------------------------------
<1> the possible values are `map`, `ordinals` and `global_ordinals`
<1> the possible values are `map`, `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality`
Please note that Elasticsearch will ignore this execution hint if it is not applicable.
Please note that Elasticsearch will ignore this execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints.