OpenSearch/docs/reference/search/facets/terms-facet.asciidoc

[[search-facets-terms-facet]]
=== Terms Facet

Allow to specify field facets that return the N most frequent terms. For
example:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10
            }
        }
    }
}
--------------------------------------------------

It is preferred to have the terms facet executed on a non analyzed
field, or a field without a large number of terms it breaks to.

==== Accuracy Control

The `size` parameter defines how many top terms should be returned out
of the overall terms list. By default, the node coordinating the
search process will ask each shard to provide its own top `size` terms
and once all shards respond, it will reduce the results to the final list
that will then be sent back to the client. This means that if the number
of unique terms is greater than `size`, the returned list is slightly off
and not accurate (it could be that the term counts are slightly off and it
could even be that a term that should have been in the top `size` entries
was not returned).

The higher the requested `size` is, the more accurate the results will be,
but also, the more expensive it will be to compute the final results (both
due to bigger priority queues that are managed on a shard level and due to
bigger data transfers between the nodes and the client). In an attempt to
minimize the extra work that comes with bigger requested `size` the
`shard_size` parameter was introduced. When defined, it will determine
how many terms the coordinating node will request from each shard. Once
all the shards responded, the coordinating node will then reduce them
to a final result which will be based on the `size` parameter - this way,
one can increase the accuracy of the returned terms and avoid the overhead
of streaming a big list of terms back to the client.

Note that `shard_size` cannot be smaller than `size`... if that's the case
elasticsearch will override it and reset it to be equal to `size`.


==== Ordering

Allow to control the ordering of the terms facets, to be ordered by
`count`, `term`, `reverse_count` or `reverse_term`. The default is
`count`. Here is an example:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10,
                "order" : "term"
            }
        }
    }
}
--------------------------------------------------

==== All Terms

Allow to get all the terms in the terms facet, ones that do not match a
hit, will have a count of 0. Note, this should not be used with fields
that have many terms.

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "all_terms" : true
            }
        }
    }
}
--------------------------------------------------

==== Excluding Terms

It is possible to specify a set of terms that should be excluded from
the terms facet request result:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : { }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "exclude" : ["term1", "term2"]
            }
        }
    }
}
--------------------------------------------------

==== Regex Patterns

The terms API allows to define regex expression that will control which
terms will be included in the faceted list, here is an example:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : { }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "regex" : "_regex expression here_",
                "regex_flags" : "DOTALL"
            }
        }
    }
}
--------------------------------------------------

Check
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#field_summary[Java
Pattern API] for more details about `regex_flags` options.

==== Term Scripts

Allow to define a script for terms facet to process the actual term that
will be used in the term facet collection, and also optionally control
its inclusion or not.

The script can either return a boolean value, with `true` to include it
in the facet collection, and `false` to exclude it from the facet
collection.

Another option is for the script to return a `string` controlling the
term that will be used to count against. The script execution will
include the term variable which is the current field term used.

For example:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10,
                "script" : "term + 'aaa'"
            }
        }
    }
}
--------------------------------------------------

And using the boolean feature:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10,
                "script" : "term == 'aaa' ? true : false"
            }
        }
    }
}
--------------------------------------------------

==== Multi Fields

The term facet can be executed against more than one field, returning
the aggregation result across those fields. For example:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "fields" : ["tag1", "tag2"],
                "size" : 10
            }
        }
    }
}
--------------------------------------------------

==== Script Field

A script that provides the actual terms that will be processed for a
given doc. A `script_field` (or `script` which will be used when no
`field` or `fields` are provided) can be set to provide it.

As an example, a search request (that is quite "heavy") can be executed
and use either `_source` itself or `_fields` (for stored fields) without
needing to load the terms to memory (at the expense of much slower
execution of the search, and causing more IO load):

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "my_facet" : {
            "terms" : {
                "script_field" : "_source.my_field",
                "size" : 10
            }
        }
    }
}
--------------------------------------------------

Or:

[source,js]
--------------------------------------------------
{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "my_facet" : {
            "terms" : {
                "script_field" : "_fields['my_field']",
                "size" : 10
            }
        }
    }
}
--------------------------------------------------

Note also, that the above will use the whole field value as a single
term.

==== _index

The term facet allows to specify a special field name called `_index`.
This will return a facet count of hits per `_index` the search was
executed on (relevant when a search request spans more than one index).

==== Memory Considerations

Term facet causes the relevant field values to be loaded into memory.
This means that per shard, there should be enough memory to contain
them. It is advisable to explicitly set the fields to be `not_analyzed`
or make sure the number of unique tokens a field can have is not large.
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[search-facets-terms-facet]]`
			`=== Terms Facet`

			`Allow to specify field facets that return the N most frequent terms. For`
			`example:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"field" : "tag",`
			`"size" : 10`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`It is preferred to have the terms facet executed on a non analyzed`
			`field, or a field without a large number of terms it breaks to.`

introduced support for "shard_size" for terms & terms_stats facets. The "shard_size" is the number of term entries each shard will send back to the coordinating node. "shard_size" > "size" will increase the accuracy (both in terms of the counts associated with each term and the terms that will actually be returned the user) - of course, the higher "shard_size" is, the more expensive the processing becomes as bigger queues are maintained on a shard level and larger lists are streamed back from the shards. closes #3821 2013-09-29 20:14:34 -04:00			`==== Accuracy Control`

			The `size` parameter defines how many top terms should be returned out
			`of the overall terms list. By default, the node coordinating the`
			search process will ask each shard to provide its own top `size` terms
Documentation typo. 2014-01-20 05:18:26 -05:00			`and once all shards respond, it will reduce the results to the final list`
introduced support for "shard_size" for terms & terms_stats facets. The "shard_size" is the number of term entries each shard will send back to the coordinating node. "shard_size" > "size" will increase the accuracy (both in terms of the counts associated with each term and the terms that will actually be returned the user) - of course, the higher "shard_size" is, the more expensive the processing becomes as bigger queues are maintained on a shard level and larger lists are streamed back from the shards. closes #3821 2013-09-29 20:14:34 -04:00			`that will then be sent back to the client. This means that if the number`
			of unique terms is greater than `size`, the returned list is slightly off
			`and not accurate (it could be that the term counts are slightly off and it`
			could even be that a term that should have been in the top `size` entries
			`was not returned).`

			The higher the requested `size` is, the more accurate the results will be,
			`but also, the more expensive it will be to compute the final results (both`
			`due to bigger priority queues that are managed on a shard level and due to`
			`bigger data transfers between the nodes and the client). In an attempt to`
- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 06:35:25 -05:00			minimize the extra work that comes with bigger requested `size` the
			`shard_size` parameter was introduced. When defined, it will determine
			`how many terms the coordinating node will request from each shard. Once`
introduced support for "shard_size" for terms & terms_stats facets. The "shard_size" is the number of term entries each shard will send back to the coordinating node. "shard_size" > "size" will increase the accuracy (both in terms of the counts associated with each term and the terms that will actually be returned the user) - of course, the higher "shard_size" is, the more expensive the processing becomes as bigger queues are maintained on a shard level and larger lists are streamed back from the shards. closes #3821 2013-09-29 20:14:34 -04:00			`all the shards responded, the coordinating node will then reduce them`
			to a final result which will be based on the `size` parameter - this way,
- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation 2013-11-29 06:35:25 -05:00			`one can increase the accuracy of the returned terms and avoid the overhead`
introduced support for "shard_size" for terms & terms_stats facets. The "shard_size" is the number of term entries each shard will send back to the coordinating node. "shard_size" > "size" will increase the accuracy (both in terms of the counts associated with each term and the terms that will actually be returned the user) - of course, the higher "shard_size" is, the more expensive the processing becomes as bigger queues are maintained on a shard level and larger lists are streamed back from the shards. closes #3821 2013-09-29 20:14:34 -04:00			`of streaming a big list of terms back to the client.`

			Note that `shard_size` cannot be smaller than `size`... if that's the case
			elasticsearch will override it and reset it to be equal to `size`.


Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Ordering`

			`Allow to control the ordering of the terms facets, to be ordered by`
			`count`, `term`, `reverse_count` or `reverse_term`. The default is
			`count`. Here is an example:

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"field" : "tag",`
			`"size" : 10,`
			`"order" : "term"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`==== All Terms`

			`Allow to get all the terms in the terms facet, ones that do not match a`
			`hit, will have a count of 0. Note, this should not be used with fields`
			`that have many terms.`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"field" : "tag",`
			`"all_terms" : true`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`==== Excluding Terms`

			`It is possible to specify a set of terms that should be excluded from`
			`the terms facet request result:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"field" : "tag",`
			`"exclude" : ["term1", "term2"]`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`==== Regex Patterns`

			`The terms API allows to define regex expression that will control which`
			`terms will be included in the faceted list, here is an example:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"field" : "tag",`
			`"regex" : "_regex expression here_",`
			`"regex_flags" : "DOTALL"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`Check`
			`http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#field_summary[Java`
			Pattern API] for more details about `regex_flags` options.

			`==== Term Scripts`

			`Allow to define a script for terms facet to process the actual term that`
			`will be used in the term facet collection, and also optionally control`
			`its inclusion or not.`

			The script can either return a boolean value, with `true` to include it
			in the facet collection, and `false` to exclude it from the facet
			`collection.`

			Another option is for the script to return a `string` controlling the
			`term that will be used to count against. The script execution will`
			`include the term variable which is the current field term used.`

			`For example:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"field" : "tag",`
			`"size" : 10,`
			`"script" : "term + 'aaa'"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`And using the boolean feature:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"field" : "tag",`
			`"size" : 10,`
			`"script" : "term == 'aaa' ? true : false"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`==== Multi Fields`

			`The term facet can be executed against more than one field, returning`
			`the aggregation result across those fields. For example:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"tag" : {`
			`"terms" : {`
			`"fields" : ["tag1", "tag2"],`
			`"size" : 10`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`==== Script Field`

			`A script that provides the actual terms that will be processed for a`
			given doc. A `script_field` (or `script` which will be used when no
			`field` or `fields` are provided) can be set to provide it.

			`As an example, a search request (that is quite "heavy") can be executed`
			and use either `_source` itself or `_fields` (for stored fields) without
			`needing to load the terms to memory (at the expense of much slower`
			`execution of the search, and causing more IO load):`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"my_facet" : {`
			`"terms" : {`
			`"script_field" : "_source.my_field",`
			`"size" : 10`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`Or:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"query" : {`
			`"match_all" : { }`
			`},`
			`"facets" : {`
			`"my_facet" : {`
			`"terms" : {`
			`"script_field" : "_fields['my_field']",`
			`"size" : 10`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`Note also, that the above will use the whole field value as a single`
			`term.`

			`==== _index`

			The term facet allows to specify a special field name called `_index`.
			This will return a facet count of hits per `_index` the search was
			`executed on (relevant when a search request spans more than one index).`

			`==== Memory Considerations`

			`Term facet causes the relevant field values to be loaded into memory.`
			`This means that per shard, there should be enough memory to contain`
			them. It is advisable to explicitly set the fields to be `not_analyzed`
			`or make sure the number of unique tokens a field can have is not large.`