mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-09 14:34:43 +00:00
Today if we search across a large amount of shards we hit every shard. Yet, it's quite common to search across an index pattern for time based indices but filtering will exclude all results outside a certain time range ie. `now-3d`. While the search can potentially hit hundreds of shards the majority of the shards might yield 0 results since there is not document that is within this date range. Kibana for instance does this regularly but used `_field_stats` to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results. This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
199 lines
5.4 KiB
Plaintext
199 lines
5.4 KiB
Plaintext
[[search-request-body]]
|
|
== Request Body Search
|
|
|
|
The search request can be executed with a search DSL, which includes the
|
|
<<query-dsl,Query DSL>>, within its body. Here is an
|
|
example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /twitter/tweet/_search
|
|
{
|
|
"query" : {
|
|
"term" : { "user" : "kimchy" }
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:twitter]
|
|
|
|
And here is a sample response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 1,
|
|
"timed_out": false,
|
|
"_shards":{
|
|
"total" : 1,
|
|
"successful" : 1,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits":{
|
|
"total" : 1,
|
|
"max_score": 1.3862944,
|
|
"hits" : [
|
|
{
|
|
"_index" : "twitter",
|
|
"_type" : "tweet",
|
|
"_id" : "0",
|
|
"_score": 1.3862944,
|
|
"_source" : {
|
|
"user" : "kimchy",
|
|
"message": "trying out Elasticsearch",
|
|
"date" : "2009-11-15T14:12:12",
|
|
"likes" : 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 1/"took": $body.took/]
|
|
|
|
[float]
|
|
=== Parameters
|
|
|
|
[horizontal]
|
|
`timeout`::
|
|
|
|
A search timeout, bounding the search request to be executed within the
|
|
specified time value and bail with the hits accumulated up to that point
|
|
when expired. Defaults to no timeout. See <<time-units>>.
|
|
|
|
`from`::
|
|
|
|
To retrieve hits from a certain offset. Defaults to `0`.
|
|
|
|
`size`::
|
|
|
|
The number of hits to return. Defaults to `10`. If you do not care about
|
|
getting some hits back but only about the number of matches and/or
|
|
aggregations, setting the value to `0` will help performance.
|
|
|
|
`search_type`::
|
|
|
|
The type of the search operation to perform. Can be
|
|
`dfs_query_then_fetch` or `query_then_fetch`.
|
|
Defaults to `query_then_fetch`.
|
|
See <<search-request-search-type,_Search Type_>> for more.
|
|
|
|
`request_cache`::
|
|
|
|
Set to `true` or `false` to enable or disable the caching
|
|
of search results for requests where `size` is 0, ie
|
|
aggregations and suggestions (no top hits returned).
|
|
See <<shard-request-cache>>.
|
|
|
|
`terminate_after`::
|
|
|
|
The maximum number of documents to collect for each shard,
|
|
upon reaching which the query execution will terminate early. If set, the
|
|
response will have a boolean field `terminated_early` to indicate whether
|
|
the query execution has actually terminated_early. Defaults to no
|
|
terminate_after.
|
|
|
|
`batched_reduce_size`::
|
|
|
|
The number of shard results that should be reduced at once on the
|
|
coordinating node. This value should be used as a protection mechanism to
|
|
reduce the memory overhead per search request if the potential number of
|
|
shards in the request can be large.
|
|
|
|
|
|
|
|
Out of the above, the `search_type` and the `request_cache` must be passed as
|
|
query-string parameters. The rest of the search request should be passed
|
|
within the body itself. The body content can also be passed as a REST
|
|
parameter named `source`.
|
|
|
|
Both HTTP GET and HTTP POST can be used to execute search with body. Since not
|
|
all clients support GET with body, POST is allowed as well.
|
|
|
|
[float]
|
|
=== Fast check for any matching docs
|
|
|
|
In case we only want to know if there are any documents matching a
|
|
specific query, we can set the `size` to `0` to indicate that we are not
|
|
interested in the search results. Also we can set `terminate_after` to `1`
|
|
to indicate that the query execution can be terminated whenever the first
|
|
matching document was found (per shard).
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /_search?q=message:elasticsearch&size=0&terminate_after=1
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:twitter]
|
|
|
|
The response will not contain any hits as the `size` was set to `0`. The
|
|
`hits.total` will be either equal to `0`, indicating that there were no
|
|
matching documents, or greater than `0` meaning that there were at least
|
|
as many documents matching the query when it was early terminated.
|
|
Also if the query was terminated early, the `terminated_early` flag will
|
|
be set to `true` in the response.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 3,
|
|
"timed_out": false,
|
|
"terminated_early": true,
|
|
"_shards": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"skipped" : 0,
|
|
"failed": 0
|
|
},
|
|
"hits": {
|
|
"total": 1,
|
|
"max_score": 0.0,
|
|
"hits": []
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 3/"took": $body.took/]
|
|
|
|
include::request/query.asciidoc[]
|
|
|
|
include::request/from-size.asciidoc[]
|
|
|
|
include::request/sort.asciidoc[]
|
|
|
|
include::request/source-filtering.asciidoc[]
|
|
|
|
include::request/stored-fields.asciidoc[]
|
|
|
|
include::request/script-fields.asciidoc[]
|
|
|
|
include::request/docvalue-fields.asciidoc[]
|
|
|
|
include::request/post-filter.asciidoc[]
|
|
|
|
include::request/highlighting.asciidoc[]
|
|
|
|
include::request/rescore.asciidoc[]
|
|
|
|
include::request/search-type.asciidoc[]
|
|
|
|
include::request/scroll.asciidoc[]
|
|
|
|
include::request/preference.asciidoc[]
|
|
|
|
include::request/explain.asciidoc[]
|
|
|
|
include::request/version.asciidoc[]
|
|
|
|
include::request/index-boost.asciidoc[]
|
|
|
|
include::request/min-score.asciidoc[]
|
|
|
|
include::request/named-queries-and-filters.asciidoc[]
|
|
|
|
include::request/inner-hits.asciidoc[]
|
|
|
|
include::request/collapse.asciidoc[]
|
|
|
|
include::request/search-after.asciidoc[]
|