Docs: Rewrote the filtered query docs to be clearer

Closes #1688
This commit is contained in:
Clinton Gormley 2014-06-19 16:33:57 +02:00
parent 8ccfca3a2f
commit adf6e794b6
1 changed files with 134 additions and 29 deletions

View File

@ -1,54 +1,159 @@
[[query-dsl-filtered-query]]
=== Filtered Query
A query that applies a filter to the results of another query. This
query maps to Lucene `FilteredQuery`.
The `filtered` query is used to combine another query with any
<<query-dsl-filters,filter>>). Filters are usually faster than queries because:
* they don't have to calculate the relevance `_score` for each document --
the answer is just a boolean ``Yes, the document matches the filter'' or
``No, the document does not match the filter''.
* the results from most filters can be cached in memory, making subsequent
executions faster.
TIP: Exclude as many document as you can with a filter, then query just the
documents that remain.
[source,js]
--------------------------------------------------
{
"filtered": {
"query": {
"term" : { "tag" : "wow" }
"match": { "tweet": "full text search" }
},
"filter": {
"range" : {
"age" : { "from" : 10, "to" : 20 }
"range": { "created": { "gte": "now - 1d / d" }}
}
}
}
--------------------------------------------------
The `filtered` query can be used wherever a `query` is expected, for instance,
to use the above example in search request:
[source,js]
--------------------------------------------------
curl -XGET localhost:9200/_search -d '
{
"query": {
"filtered": { <1>
"query": {
"match": { "tweet": "full text search" }
},
"filter": {
"range": { "created": { "gte": "now - 1d / d" }}
}
}
}
}
'
--------------------------------------------------
<1> The `filtered` query is passed as the value of the `query`
parameter in the search request.
==== Filtering without a query
If a `query` is not specified, it defaults to the
<<query-dsl-match-all-query,`match_all` query>>. This means that the
`filtered` query can be used to wrap just a filter, so that it can be used
wherever a query is expected.
[source,js]
--------------------------------------------------
curl -XGET localhost:9200/_search -d '
{
"query": {
"filtered": { <1>
"filter": {
"range": { "created": { "gte": "now - 1d / d" }}
}
}
}
}
'
--------------------------------------------------
<1> No `query` has been specfied, so this request applies just the filter,
returning all documents created since yesterday.
==== Multiple filters
Multiple filters can be applied by wrapping them in a
<<query-dsl-bool-filter,`bool` filter>>, for example:
[source,js]
--------------------------------------------------
{
"filtered": {
"query": { "match": { "tweet": "full text search" }},
"filter": {
"bool": {
"must": { "range": { "created": { "gte": "now - 1d / d" }}},
"should": [
{ "term": { "featured": true }},
{ "term": { "starred": true }}
],
"must_not": { "term": { "deleted": false }}
}
}
}
}
--------------------------------------------------
The filter object can hold only filter elements, not queries. Filters
can be much faster compared to queries since they don't perform any
scoring, especially when they are cached.
Similarly, multiple queries can be combined with a
<<query-dsl-bool-query,`bool` query>>.
==== Filter strategy
The filtered query allows to configure how to intersect the filter with the query:
You can control how the filter and query are executed with the `strategy`
parameter:
[source,js]
--------------------------------------------------
{
"filtered" : {
"query" : {
// query definition
},
"filter" : {
// filter definition
},
"query" : { ... },
"filter" : { ... ],
"strategy": "leap_frog"
}
}
--------------------------------------------------
[horizontal]
`leap_frog_query_first`:: Look for the first document matching the query, and then alternatively advance the query and the filter to find common matches.
`leap_frog_filter_first`:: Look for the first document matching the filter, and then alternatively advance the query and the filter to find common matches.
`leap_frog`:: Same as `leap_frog_query_first`.
`query_first`:: If the filter supports random access, then search for documents using the query, and then consult the filter to check whether there is a match. Otherwise fall back to `leap_frog_query_first`.
`random_access_${threshold}`:: If the filter supports random access and if there is at least one matching document among the first `threshold` ones, then apply the filter first. Otherwise fall back to `leap_frog_query_first`. `${threshold}` must be greater than or equal to `1`.
`random_access_always`:: Apply the filter first if it supports random access. Otherwise fall back to `leap_frog_query_first`.
IMPORTANT: This is an _expert-level_ setting. Most users can simply ignore it.
The default strategy is to use `query_first` on filters that are not advanceable such as geo filters and script filters, and `random_access_100` on other filters.
The `strategy` parameter accepts the following options:
[horizontal]
`leap_frog_query_first`::
Look for the first document matching the query, and then alternatively
advance the query and the filter to find common matches.
`leap_frog_filter_first`::
Look for the first document matching the filter, and then alternatively
advance the query and the filter to find common matches.
`leap_frog`::
Same as `leap_frog_query_first`.
`query_first`::
If the filter supports random access, then search for documents using the
query, and then consult the filter to check whether there is a match.
Otherwise fall back to `leap_frog_query_first`.
`random_access_${threshold}`::
If the filter supports random access and if there is at least one matching
document among the first `threshold` ones, then apply the filter first.
Otherwise fall back to `leap_frog_query_first`. `${threshold}` must be
greater than or equal to `1`.
`random_access_always`::
Apply the filter first if it supports random access. Otherwise fall back
to `leap_frog_query_first`.
The default strategy is to use `query_first` on filters that are not
advanceable such as geo filters and script filters, and `random_access_100` on
other filters.