OpenSearch/docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc

[[search-aggregations-bucket-terms-aggregation]]
=== Terms

A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.

Example:

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "genders" : {
            "terms" : { "field" : "gender" }
        }
    }
}
--------------------------------------------------

Response:

[source,js]
--------------------------------------------------
{
    ...

    "aggregations" : {
        "genders" : {
            "buckets" : [
                {
                    "key" : "male",
                    "doc_count" : 10
                },
                {
                    "key" : "female",
                    "doc_count" : 10
                },
            ]
        }
    }
}
--------------------------------------------------

By default, the `terms` aggregation will return the buckets for the top ten terms ordered by the `doc_count`. One can
change this default behaviour by setting the `size` parameter.

==== Size & Shard Size

The `size` parameter can be set to define how many term buckets should be returned out of the overall terms list. By
default, the node coordinating the search process will request each shard to provide its own top `size` term buckets
and once all shards respond, it will reduces the results to the final list that will then be returned to the client.
This means that if the number of unique terms is greater than `size`, the returned list is slightly off and not accurate
(it could be that the term counts are slightly off and it could even be that a term that should have been in the top
size buckets was not returned).

The higher the requested `size` is, the more accurate the results will be, but also, the more expensive it will be to
compute the final results (both due to bigger priority queues that are managed on a shard level and due to bigger data
transfers between the nodes and the client).

The `shard_size` parameter can be  used to minimize the extra work that comes with bigger requested `size`. When defined,
it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the
coordinating node will then reduce them to a final result which will be based on the `size` parameter - this way,
one can increase the accuracy of the returned terms and avoid the overhead of streaming a big list of buckets back to
the client.

NOTE:   `shard_size` cannot be smaller than `size` (as it doesn't make much sense). When it is, elasticsearch will
        override it and reset it to be equal to `size`.


==== Order

The order of the buckets can be customized by setting the `order` parameter. By default, the buckets are ordered by
their `doc_count` descending. It is also possible to change this behaviour as follows:

Ordering the buckets by their `doc_count` in an ascending manner:

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "genders" : {
            "terms" : { 
                "field" : "gender",
                "order" : { "_count" : "asc" }
            }
        }
    }
}
--------------------------------------------------

Ordering the buckets alphabetically by their terms in an ascending manner:

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "genders" : {
            "terms" : { 
                "field" : "gender",
                "order" : { "_term" : "asc" }
            }
        }
    }
}
--------------------------------------------------


Ordering the buckets by single value metrics sub-aggregation (identified by the aggregation name):

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "genders" : {
            "terms" : { 
                "field" : "gender",
                "order" : { "avg_height" : "desc" }
            },
            "aggs" : {
                "avg_height" : { "avg" : { "field" : "height" } }
            }
        }
    }
}
--------------------------------------------------

Ordering the buckets by multi value metrics sub-aggregation (identified by the aggregation name):

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "genders" : {
            "terms" : { 
                "field" : "gender",
                "order" : { "stats.avg" : "desc" }
            },
            "aggs" : {
                "height_stats" : { "stats" : { "field" : "height" } }
            }
        }
    }
}
--------------------------------------------------

==== Script

Generating the terms using a script:

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "genders" : {
            "terms" : { 
                "script" : "doc['gender'].value"
            }
        }
    }
}
--------------------------------------------------

==== Value Script

[source,js]
--------------------------------------------------
{
    "aggs" : {
        "genders" : {
            "terms" : { 
                "field" : "gender",
                "script" : "doc['gender'].value"
            }
        }
    }
}
--------------------------------------------------
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00			`[[search-aggregations-bucket-terms-aggregation]]`
			`=== Terms`

			`A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.`

			`Example:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"genders" : {`
			`"terms" : { "field" : "gender" }`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`Response:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`...`

			`"aggregations" : {`
			`"genders" : {`
			`"buckets" : [`
			`{`
			`"key" : "male",`
			`"doc_count" : 10`
			`},`
			`{`
			`"key" : "female",`
			`"doc_count" : 10`
			`},`
			`]`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 06:35:25 -05:00			By default, the `terms` aggregation will return the buckets for the top ten terms ordered by the `doc_count`. One can
			change this default behaviour by setting the `size` parameter.
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 06:35:25 -05:00			`==== Size & Shard Size`

			The `size` parameter can be set to define how many term buckets should be returned out of the overall terms list. By
			default, the node coordinating the search process will request each shard to provide its own top `size` term buckets
			`and once all shards respond, it will reduces the results to the final list that will then be returned to the client.`
			This means that if the number of unique terms is greater than `size`, the returned list is slightly off and not accurate
			`(it could be that the term counts are slightly off and it could even be that a term that should have been in the top`
			`size buckets was not returned).`

			The higher the requested `size` is, the more accurate the results will be, but also, the more expensive it will be to
			`compute the final results (both due to bigger priority queues that are managed on a shard level and due to bigger data`
			`transfers between the nodes and the client).`

			The `shard_size` parameter can be used to minimize the extra work that comes with bigger requested `size`. When defined,
			`it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the`
			coordinating node will then reduce them to a final result which will be based on the `size` parameter - this way,
			`one can increase the accuracy of the returned terms and avoid the overhead of streaming a big list of buckets back to`
			`the client.`

			NOTE: `shard_size` cannot be smaller than `size` (as it doesn't make much sense). When it is, elasticsearch will
			override it and reset it to be equal to `size`.
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00

			`==== Order`

- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 06:35:25 -05:00			The order of the buckets can be customized by setting the `order` parameter. By default, the buckets are ordered by
			their `doc_count` descending. It is also possible to change this behaviour as follows:
initial commit of the aggregations module Closes #3300 2013-11-24 06:13:08 -05:00
			Ordering the buckets by their `doc_count` in an ascending manner:

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"genders" : {`
			`"terms" : {`
			`"field" : "gender",`
			`"order" : { "_count" : "asc" }`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`Ordering the buckets alphabetically by their terms in an ascending manner:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"genders" : {`
			`"terms" : {`
			`"field" : "gender",`
			`"order" : { "_term" : "asc" }`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`


			`Ordering the buckets by single value metrics sub-aggregation (identified by the aggregation name):`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"genders" : {`
			`"terms" : {`
			`"field" : "gender",`
			`"order" : { "avg_height" : "desc" }`
			`},`
			`"aggs" : {`
			`"avg_height" : { "avg" : { "field" : "height" } }`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`Ordering the buckets by multi value metrics sub-aggregation (identified by the aggregation name):`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"genders" : {`
			`"terms" : {`
			`"field" : "gender",`
			`"order" : { "stats.avg" : "desc" }`
			`},`
			`"aggs" : {`
			`"height_stats" : { "stats" : { "field" : "height" } }`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`==== Script`

			`Generating the terms using a script:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"genders" : {`
			`"terms" : {`
			`"script" : "doc['gender'].value"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`==== Value Script`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"aggs" : {`
			`"genders" : {`
			`"terms" : {`
			`"field" : "gender",`
			`"script" : "doc['gender'].value"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`