284 lines
8.2 KiB
Plaintext
284 lines
8.2 KiB
Plaintext
[[search-facets]]
|
|
== Facets
|
|
|
|
The usual purpose of a full-text search engine is to return a small
|
|
number of documents matching your query.
|
|
|
|
_Facets_ provide aggregated data based on a search query. In the
|
|
simplest case, a
|
|
<<search-facets-terms-facet,terms facet>>
|
|
can return _facet counts_ for various _facet values_ for a specific
|
|
_field_. ElasticSearch supports more facet implementations, such as
|
|
<<search-facets-statistical-facet,statistical>>
|
|
or
|
|
<<search-facets-date-histogram-facet,date
|
|
histogram>> facets.
|
|
|
|
The field used for facet calculations _must_ be of type numeric,
|
|
date/time or be analyzed as a single token — see the
|
|
<<mapping,_Mapping_>> guide for details on the
|
|
analysis process.
|
|
|
|
You can give the facet a custom _name_ and return multiple facets in one
|
|
request.
|
|
|
|
Let's try it out with a simple example. Suppose we have a number of
|
|
articles with a field called `tags`, preferably analyzed with the
|
|
<<analysis-keyword-analyzer,keyword>>
|
|
analyzer. The facet aggregation will return counts for the most popular
|
|
tags across the documents matching your query — or across all documents
|
|
in the index.
|
|
|
|
We will store some example data first:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -X DELETE "http://localhost:9200/articles"
|
|
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : ["foo"]}'
|
|
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : ["foo", "bar"]}'
|
|
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'
|
|
--------------------------------------------------
|
|
|
|
Now, let's query the index for articles beginning with letter `T`
|
|
and retrieve a
|
|
<<search-facets-terms-facet,_terms facet_>>
|
|
for the `tags` field. We will name the facet simply: _tags_.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '
|
|
{
|
|
"query" : { "query_string" : {"query" : "T*"} },
|
|
"facets" : {
|
|
"tags" : { "terms" : {"field" : "tags"} }
|
|
}
|
|
}
|
|
'
|
|
--------------------------------------------------
|
|
|
|
This request will return articles `Two` and `Three` (because
|
|
they match our query), as well as the `tags` facet:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
"facets" : {
|
|
"tags" : {
|
|
"_type" : "terms",
|
|
"missing" : 0,
|
|
"total": 5,
|
|
"other": 0,
|
|
"terms" : [ {
|
|
"term" : "foo",
|
|
"count" : 2
|
|
}, {
|
|
"term" : "bar",
|
|
"count" : 2
|
|
}, {
|
|
"term" : "baz",
|
|
"count" : 1
|
|
} ]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
In the `terms` array, relevant _terms_ and _counts_ are returned. You'll
|
|
probably want to display these to your users. The facet returns several
|
|
important counts:
|
|
|
|
* `missing` : The number of documents which have no value for the
|
|
faceted field +
|
|
* `total` : The total number of terms in the facet +
|
|
* `other` : The number of terms not included in the returned facet
|
|
(effectively `other` = `total` - `terms` )
|
|
|
|
Notice, that the counts are scoped to the current query: _foo_ is
|
|
counted only twice (not three times), _bar_ is counted twice and _baz_
|
|
once. Also note that terms are counted once per document, even if the
|
|
occur more frequently in that document.
|
|
|
|
That's because the primary purpose of facets is to enable
|
|
http://en.wikipedia.org/wiki/Faceted_search[_faceted navigation_],
|
|
allowing the user to refine her query based on the insight from the
|
|
facet, i.e. restrict the search to a specific category, price or date
|
|
range. Facets can be used, however, for other purposes: computing
|
|
histograms, statistical aggregations, and more. See the blog about
|
|
link:/blog/data-visualization-with-elasticsearch-and-protovis/[data visualization].for inspiration.
|
|
|
|
|
|
|
|
[float]
|
|
=== Scope
|
|
|
|
As we have already mentioned, facet computation is restricted to the
|
|
scope of the current query, called `main`, by default. Facets can be
|
|
computed within the `global` scope as well, in which case it will return
|
|
values computed across all documents in the index:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"facets" : {
|
|
"my_facets" : {
|
|
"terms" : { ... },
|
|
"global" : true <1>
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
<1> The `global` keyword can be used with any facet type.
|
|
|
|
There's one *important distinction* to keep in mind. While search
|
|
_queries_ restrict both the returned documents and facet counts, search
|
|
_filters_ restrict only returned documents — but _not_ facet counts.
|
|
|
|
If you need to restrict both the documents and facets, and you're not
|
|
willing or able to use a query, you may use a _facet filter_.
|
|
|
|
[float]
|
|
=== Facet Filter
|
|
|
|
All facets can be configured with an additional filter (explained in the
|
|
<<query-dsl,Query DSL>> section), which _will_ reduce
|
|
the documents they use for computing results. An example with a _term_
|
|
filter:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"facets" : {
|
|
"<FACET NAME>" : {
|
|
"<FACET TYPE>" : {
|
|
...
|
|
},
|
|
"facet_filter" : {
|
|
"term" : { "user" : "kimchy"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Note that this is different from a facet of the
|
|
<<search-facets-filter-facet,filter>> type.
|
|
|
|
[float]
|
|
=== Facets with the _nested_ types
|
|
|
|
<<mapping-nested-type,Nested>> mapping allows
|
|
for better support for "inner" documents faceting, especially when it
|
|
comes to multi valued key and value facets (like histograms, or term
|
|
stats).
|
|
|
|
What is it good for? First of all, this is the only way to use facets on
|
|
nested documents once they are used (possibly for other reasons). But,
|
|
there is also facet specific reason why nested documents can be used,
|
|
and that's the fact that facets working on different key and value field
|
|
(like term_stats, or histogram) can now support cases where both are
|
|
multi valued properly.
|
|
|
|
For example, let's use the following mapping:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"type1" : {
|
|
"properties" : {
|
|
"obj1" : {
|
|
"type" : "nested"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
And, here is a sample data:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"obj1" : [
|
|
{
|
|
"name" : "blue",
|
|
"count" : 4
|
|
},
|
|
{
|
|
"name" : "green",
|
|
"count" : 6
|
|
}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
[float]
|
|
==== All Nested Matching Root Documents
|
|
|
|
Another option is to run the facet on all the nested documents matching
|
|
the root objects that the main query will end up producing. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"query": {
|
|
"match_all": {}
|
|
},
|
|
"facets": {
|
|
"facet1": {
|
|
"terms_stats": {
|
|
"key_field" : "name",
|
|
"value_field": "count"
|
|
},
|
|
"nested": "obj1"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The `nested` element provides the path to the nested document (can be a
|
|
multi level nested docs) that will be used.
|
|
|
|
Facet filter allows you to filter your facet on the nested object level.
|
|
It is important that these filters match on the nested object level and
|
|
not on the root document level. In the following example the
|
|
`terms_stats` only applies on nested objects with the name 'blue'.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"query": {
|
|
"match_all": {}
|
|
},
|
|
"facets": {
|
|
"facet1": {
|
|
"terms_stats": {
|
|
"key_field" : "name",
|
|
"value_field": "count"
|
|
},
|
|
"nested": "obj1",
|
|
"facet_filter" : {
|
|
"term" : {"name" : "blue"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
include::facets/terms-facet.asciidoc[]
|
|
|
|
include::facets/range-facet.asciidoc[]
|
|
|
|
include::facets/histogram-facet.asciidoc[]
|
|
|
|
include::facets/date-histogram-facet.asciidoc[]
|
|
|
|
include::facets/filter-facet.asciidoc[]
|
|
|
|
include::facets/query-facet.asciidoc[]
|
|
|
|
include::facets/statistical-facet.asciidoc[]
|
|
|
|
include::facets/terms-stats-facet.asciidoc[]
|
|
|
|
include::facets/geo-distance-facet.asciidoc[]
|
|
|