2013-08-28 19:24:34 -04:00
|
|
|
[[search-facets-terms-facet]]
|
|
|
|
=== Terms Facet
|
|
|
|
|
|
|
|
Allow to specify field facets that return the N most frequent terms. For
|
|
|
|
example:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"field" : "tag",
|
|
|
|
"size" : 10
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
It is preferred to have the terms facet executed on a non analyzed
|
|
|
|
field, or a field without a large number of terms it breaks to.
|
|
|
|
|
2013-09-29 20:14:34 -04:00
|
|
|
==== Accuracy Control
|
|
|
|
|
|
|
|
The `size` parameter defines how many top terms should be returned out
|
|
|
|
of the overall terms list. By default, the node coordinating the
|
|
|
|
search process will ask each shard to provide its own top `size` terms
|
2014-01-20 05:18:26 -05:00
|
|
|
and once all shards respond, it will reduce the results to the final list
|
2013-09-29 20:14:34 -04:00
|
|
|
that will then be sent back to the client. This means that if the number
|
|
|
|
of unique terms is greater than `size`, the returned list is slightly off
|
|
|
|
and not accurate (it could be that the term counts are slightly off and it
|
|
|
|
could even be that a term that should have been in the top `size` entries
|
|
|
|
was not returned).
|
|
|
|
|
|
|
|
The higher the requested `size` is, the more accurate the results will be,
|
|
|
|
but also, the more expensive it will be to compute the final results (both
|
|
|
|
due to bigger priority queues that are managed on a shard level and due to
|
|
|
|
bigger data transfers between the nodes and the client). In an attempt to
|
2013-11-29 06:35:25 -05:00
|
|
|
minimize the extra work that comes with bigger requested `size` the
|
|
|
|
`shard_size` parameter was introduced. When defined, it will determine
|
|
|
|
how many terms the coordinating node will request from each shard. Once
|
2013-09-29 20:14:34 -04:00
|
|
|
all the shards responded, the coordinating node will then reduce them
|
|
|
|
to a final result which will be based on the `size` parameter - this way,
|
2013-11-29 06:35:25 -05:00
|
|
|
one can increase the accuracy of the returned terms and avoid the overhead
|
2013-09-29 20:14:34 -04:00
|
|
|
of streaming a big list of terms back to the client.
|
|
|
|
|
|
|
|
Note that `shard_size` cannot be smaller than `size`... if that's the case
|
|
|
|
elasticsearch will override it and reset it to be equal to `size`.
|
|
|
|
|
|
|
|
|
2013-08-28 19:24:34 -04:00
|
|
|
==== Ordering
|
|
|
|
|
|
|
|
Allow to control the ordering of the terms facets, to be ordered by
|
|
|
|
`count`, `term`, `reverse_count` or `reverse_term`. The default is
|
|
|
|
`count`. Here is an example:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"field" : "tag",
|
|
|
|
"size" : 10,
|
|
|
|
"order" : "term"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
==== All Terms
|
|
|
|
|
|
|
|
Allow to get all the terms in the terms facet, ones that do not match a
|
|
|
|
hit, will have a count of 0. Note, this should not be used with fields
|
|
|
|
that have many terms.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"field" : "tag",
|
|
|
|
"all_terms" : true
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
==== Excluding Terms
|
|
|
|
|
|
|
|
It is possible to specify a set of terms that should be excluded from
|
|
|
|
the terms facet request result:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"field" : "tag",
|
|
|
|
"exclude" : ["term1", "term2"]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
==== Regex Patterns
|
|
|
|
|
|
|
|
The terms API allows to define regex expression that will control which
|
|
|
|
terms will be included in the faceted list, here is an example:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"field" : "tag",
|
|
|
|
"regex" : "_regex expression here_",
|
|
|
|
"regex_flags" : "DOTALL"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Check
|
|
|
|
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#field_summary[Java
|
|
|
|
Pattern API] for more details about `regex_flags` options.
|
|
|
|
|
|
|
|
==== Term Scripts
|
|
|
|
|
|
|
|
Allow to define a script for terms facet to process the actual term that
|
|
|
|
will be used in the term facet collection, and also optionally control
|
|
|
|
its inclusion or not.
|
|
|
|
|
|
|
|
The script can either return a boolean value, with `true` to include it
|
|
|
|
in the facet collection, and `false` to exclude it from the facet
|
|
|
|
collection.
|
|
|
|
|
|
|
|
Another option is for the script to return a `string` controlling the
|
|
|
|
term that will be used to count against. The script execution will
|
|
|
|
include the term variable which is the current field term used.
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"field" : "tag",
|
|
|
|
"size" : 10,
|
|
|
|
"script" : "term + 'aaa'"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
And using the boolean feature:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"field" : "tag",
|
|
|
|
"size" : 10,
|
|
|
|
"script" : "term == 'aaa' ? true : false"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
==== Multi Fields
|
|
|
|
|
|
|
|
The term facet can be executed against more than one field, returning
|
|
|
|
the aggregation result across those fields. For example:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"tag" : {
|
|
|
|
"terms" : {
|
|
|
|
"fields" : ["tag1", "tag2"],
|
|
|
|
"size" : 10
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
==== Script Field
|
|
|
|
|
|
|
|
A script that provides the actual terms that will be processed for a
|
|
|
|
given doc. A `script_field` (or `script` which will be used when no
|
|
|
|
`field` or `fields` are provided) can be set to provide it.
|
|
|
|
|
|
|
|
As an example, a search request (that is quite "heavy") can be executed
|
|
|
|
and use either `_source` itself or `_fields` (for stored fields) without
|
|
|
|
needing to load the terms to memory (at the expense of much slower
|
|
|
|
execution of the search, and causing more IO load):
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"my_facet" : {
|
|
|
|
"terms" : {
|
|
|
|
"script_field" : "_source.my_field",
|
|
|
|
"size" : 10
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Or:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"query" : {
|
|
|
|
"match_all" : { }
|
|
|
|
},
|
|
|
|
"facets" : {
|
|
|
|
"my_facet" : {
|
|
|
|
"terms" : {
|
|
|
|
"script_field" : "_fields['my_field']",
|
|
|
|
"size" : 10
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Note also, that the above will use the whole field value as a single
|
|
|
|
term.
|
|
|
|
|
|
|
|
==== _index
|
|
|
|
|
|
|
|
The term facet allows to specify a special field name called `_index`.
|
|
|
|
This will return a facet count of hits per `_index` the search was
|
|
|
|
executed on (relevant when a search request spans more than one index).
|
|
|
|
|
|
|
|
==== Memory Considerations
|
|
|
|
|
|
|
|
Term facet causes the relevant field values to be loaded into memory.
|
|
|
|
This means that per shard, there should be enough memory to contain
|
|
|
|
them. It is advisable to explicitly set the fields to be `not_analyzed`
|
|
|
|
or make sure the number of unique tokens a field can have is not large.
|