178 lines
5.1 KiB
Plaintext
178 lines
5.1 KiB
Plaintext
[[search-aggregations-bucket-terms-aggregation]]
|
|
=== Terms
|
|
|
|
A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.
|
|
|
|
Example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"genders" : {
|
|
"terms" : { "field" : "gender" }
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
|
|
"aggregations" : {
|
|
"genders" : {
|
|
"buckets" : [
|
|
{
|
|
"key" : "male",
|
|
"doc_count" : 10
|
|
},
|
|
{
|
|
"key" : "female",
|
|
"doc_count" : 10
|
|
},
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
By default, the `terms` aggregation will return the buckets for the top ten terms ordered by the `doc_count`. One can
|
|
change this default behaviour by setting the `size` parameter.
|
|
|
|
==== Size & Shard Size
|
|
|
|
The `size` parameter can be set to define how many term buckets should be returned out of the overall terms list. By
|
|
default, the node coordinating the search process will request each shard to provide its own top `size` term buckets
|
|
and once all shards respond, it will reduces the results to the final list that will then be returned to the client.
|
|
This means that if the number of unique terms is greater than `size`, the returned list is slightly off and not accurate
|
|
(it could be that the term counts are slightly off and it could even be that a term that should have been in the top
|
|
size buckets was not returned).
|
|
|
|
The higher the requested `size` is, the more accurate the results will be, but also, the more expensive it will be to
|
|
compute the final results (both due to bigger priority queues that are managed on a shard level and due to bigger data
|
|
transfers between the nodes and the client).
|
|
|
|
The `shard_size` parameter can be used to minimize the extra work that comes with bigger requested `size`. When defined,
|
|
it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the
|
|
coordinating node will then reduce them to a final result which will be based on the `size` parameter - this way,
|
|
one can increase the accuracy of the returned terms and avoid the overhead of streaming a big list of buckets back to
|
|
the client.
|
|
|
|
NOTE: `shard_size` cannot be smaller than `size` (as it doesn't make much sense). When it is, elasticsearch will
|
|
override it and reset it to be equal to `size`.
|
|
|
|
|
|
==== Order
|
|
|
|
The order of the buckets can be customized by setting the `order` parameter. By default, the buckets are ordered by
|
|
their `doc_count` descending. It is also possible to change this behaviour as follows:
|
|
|
|
Ordering the buckets by their `doc_count` in an ascending manner:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"genders" : {
|
|
"terms" : {
|
|
"field" : "gender",
|
|
"order" : { "_count" : "asc" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Ordering the buckets alphabetically by their terms in an ascending manner:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"genders" : {
|
|
"terms" : {
|
|
"field" : "gender",
|
|
"order" : { "_term" : "asc" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
Ordering the buckets by single value metrics sub-aggregation (identified by the aggregation name):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"genders" : {
|
|
"terms" : {
|
|
"field" : "gender",
|
|
"order" : { "avg_height" : "desc" }
|
|
},
|
|
"aggs" : {
|
|
"avg_height" : { "avg" : { "field" : "height" } }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Ordering the buckets by multi value metrics sub-aggregation (identified by the aggregation name):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"genders" : {
|
|
"terms" : {
|
|
"field" : "gender",
|
|
"order" : { "stats.avg" : "desc" }
|
|
},
|
|
"aggs" : {
|
|
"height_stats" : { "stats" : { "field" : "height" } }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
==== Script
|
|
|
|
Generating the terms using a script:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"genders" : {
|
|
"terms" : {
|
|
"script" : "doc['gender'].value"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
==== Value Script
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"aggs" : {
|
|
"genders" : {
|
|
"terms" : {
|
|
"field" : "gender",
|
|
"script" : "doc['gender'].value"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|