Rewrote post-filter.asciidoc

Closes #5166
This commit is contained in:
Clinton Gormley 2014-07-31 12:48:55 +02:00
parent 521f8b28b5
commit 36e1c7928c
1 changed files with 90 additions and 66 deletions

View File

@ -1,90 +1,114 @@
[[search-request-post-filter]] [[search-request-post-filter]]
=== Post filter === Post filter
The `post_filter` allows any filter that it holds to be executed as last filter, because The `post_filter` is applied to the search `hits` at the very end of a search
of this the `post_filter` only has affect on the search hits and not facets. request, after aggregations have already been calculated. It's purpose is
best explained by example:
There are several reasons why to specify filters as `post_filter`. One reason is to force Imagine that you are selling shirts, and the user has specified two filters:
expensive filters to be executed as last filter, so that these filters only operate on the `color:red` and `brand:gucci`. You only want to show them red shirts made by
docs that match with the rest of the query. An example of for what filter a post_filter Gucci in the search results. Normally you would do this with a
should be used for this reason is the `geo_distance` filter. The `geo_distance` filter is <<query-dsl-filtered-query,`filtered` query>>:
in general an expensive filter to execute and to reduce the execution time for this filter,
one can choose to specify it as `post_filter`, so it runs on documents that are very likely
to end up as matching documents.
Another important reason is when doing things like facet navigation, [source,json]
sometimes only the hits are needed to be filtered by the chosen facet,
and all the facets should continue to be calculated based on the original query.
The `post_filter` element within the search request can be used to accomplish it.
Note, this is different compared to creating a `filtered` query with the
filter, since this will cause the facets to only process the filtered
results.
For example, let's create two tweets, with two different tags:
[source,js]
-------------------------------------------------- --------------------------------------------------
curl -XPUT 'localhost:9200/twitter/tweet/1' -d ' curl -XGET localhost:9200/shirts/_search -d '
{ {
"message" : "something blue", "query": {
"tag" : "blue" "filtered": {
"filter": {
"bool": {
"must": [
{ "term": { "color": "red" }},
{ "term": { "brand": "gucci" }}
]
}
}
}
}
} }
' '
curl -XPUT 'localhost:9200/twitter/tweet/2' -d '
{
"message" : "something green",
"tag" : "green"
}
'
curl -XPOST 'localhost:9200/_refresh'
-------------------------------------------------- --------------------------------------------------
We can now search for something, and have a terms facet. However, you would also like to use _faceted navigation_ to display a list of
other options that the user could click on. Perhaps you have a `model` field
that would allow the user to limit their search results to red Gucci
`t-shirts` or `dress-shirts`.
[source,js] This can be done with a
<<search-aggregations-bucket-terms-aggregation,`terms` aggregation>>:
[source,json]
-------------------------------------------------- --------------------------------------------------
curl -XPOST 'localhost:9200/twitter/_search?pretty=true' -d ' curl -XGET localhost:9200/shirts/_search -d '
{ {
"query" : { "query": {
"term" : { "message" : "something" } "filtered": {
"filter": {
"bool": {
"must": [
{ "term": { "color": "red" }},
{ "term": { "brand": "gucci" }}
]
}
}
}
}, },
"facets" : { "aggs": {
"tag" : { "models": {
"terms" : { "field" : "tag" } "terms": { "field": "model" } <1>
} }
} }
} }
' '
-------------------------------------------------- --------------------------------------------------
<1> Returns the most popular models of red shirts by Gucci.
We get two hits, and the relevant facets with a count of 1 for both But perhaps you would also like to tell the user how many Gucci shirts are
`green` and `blue`. Now, let's say the `green` facet is chosen, we can available in *other colors*. If you just add a `terms` aggregation on the
simply add a filter for it: `color` field, you will only get back the color `red`, because your query
returns only red shirts by Gucci.
[source,js] Instead, you want to include shirts of all colors during aggregation, then
apply the `colors` filter only to the search results. This is the purpose of
the `post_filter`:
[source,json]
-------------------------------------------------- --------------------------------------------------
curl -XPOST 'localhost:9200/twitter/_search?pretty=true' -d ' curl -XGET localhost:9200/shirts/_search -d '
{ {
"query" : { "query": {
"term" : { "message" : "something" } "filtered": {
}, "filter": {
"post_filter" : { { "term": { "brand": "gucci" }} <1>
"term" : { "tag" : "green" }
},
"facets" : {
"tag" : {
"terms" : { "field" : "tag" }
} }
} }
},
"aggs": {
"colors": {
"terms": { "field": "color" }, <2>
},
"color_red": {
"filter": {
"term": { "color": "red" } <3>
},
"aggs": {
"models": {
"terms": { "field": "model" } <3>
}
}
}
},
"post_filter": { <4>
"term": { "color": "red" },
}
} }
' '
-------------------------------------------------- --------------------------------------------------
<1> The main query now finds all shirts by Gucci, regardless of color.
And now, we get only 1 hit back, but the facets remain the same. <2> The `colors` agg returns popular colors for shirts by Gucci.
<3> The `color_red` agg limits the `models` sub-aggregation
Note, if additional filters are required on specific facets, they can be to *red* Gucci shirts.
added as a `facet_filter` to the relevant facets. <4> Finally, the `post_filter` removes colors other than red
from the search `hits`.